We love Excel, and we use it here at Trifacta all the time, particularly the VLOOKUP function. But we know VLOOKUP is also sometimes painful and cumbersome to use when you’re in a hurry to finish that model your business partner needs or that report everyone seems to be waiting on. Some of the limitations you’ve probably already run into include:
Rigid data structure required
Values being “looked up” must be to the left of the column you want results in, and return values can only be to the right–when your work is structured differently this is a major inconvenience.
Manual typing/copying is error prone
Data sources have different styles or formats, like date formats. So each time you import new data, you have to reformat it all over again. Copying the formula to multiple columns is error prone, and each time you import new data, you run that risk all over again.
Works only for simple data matching
VLOOKUP can only use one lookup value. If you need to use two, it’s difficult to set up and prone to error. Case is disregarded, so you can’t parse on that. In addition, VLOOKUP uses an approximate match as default instead of exact match, so you may get inaccurate data and not know it, especially in large files. Using approximate match, you’ll have to sort in order so that the first result is the true result. And each time you import data, you have to remember to sort (or write a macro).
Hard to share, maintain, and consume colleagues’ efforts
Today, you sort the sheet one way; tomorrow, your colleague sorts another way. Each time, the source data you use is being altered. When a new person arrives, they start all over again. Ultimately, it becomes impossible to read anyone else’s models, let alone maintain and support them. Version control can be a serious problem … and we’ve all been there when the backup failed.
Troubleshooting is extremely difficult
There’s no such thing as checking a VLOOKUP while you’re building it. You have to write the formula, hope you’ve done it perfectly, and let it run. Any mistakes you’ve built in will be hard to isolate, a situation that becomes even worse when you’re using a data source you’re even slightly unfamiliar with or that might contain errors.
Working across datasets is painful
Building a VLOOKUP that will run across separate Excel files is theoretically possible, but so hard to do that it’s easier to simply combine the files (usually through copying and pasting sheets)—which will only work until the datasets become too big to open.
Cartesian limits and speed
Big data crunches and macros can slow down your machine (sometimes even just opening the file), and in some cases can’t be run at all; files past a certain size can’t be manipulated in Excel due to cartesian limits.
Big data getting bigger
The nature of some data sources means they grow in size on a yearly, monthly, or even daily basis. Eventually, your data ends up too big for Excel to handle.
VLOOKUP: powerful, but frustrating. As your business matures and the demands on your analysts grow, you need a tool that’ll scale with them—something robust but flexible, and able to handle a mounting array of disparate sources effectively. What’s the way forward? Find out in Part Two of our exploration coming out next week. Until then, you can start wrangling with Trifacta.