While sitting in a cubicle, doing the kind of work you would expect I’d be doing in a cubicle at a large company, I got a Gchat message from my wife. It wasn’t one of her normal Gchat questions—she was asking why her VLOOKUP in Excel was not working. Now, my wife is no dummy. Admittedly, she is much smarter than I am. So, I was loving the fact that she needed my help with a spreadsheet. We got on the phone and I started going through the checklist of things that could be causing her VLOOKUP not to work. Frustrated, she said things like, “You mean it matters what order my columns are in? And the casing? What order does the formula go in? This is stupid.”
Yes, it is stupid. I spent a lot time in my career pulling my hair out trying to figure out why my VLOOKUP wasn’t working. I used to work as an analyst at one of the world’s largest CPG companies, and Excel was often no match for the size and variety of the data we leveraged. We struggled through endless hours of Excel, and then Access, until finally seeking out a modern approach to data prep, Trifacta. In comparison, using traditional tools like Excel and Access felt like watching a movie on a VHS; Trifacta was my Netflix.
In this post, I’ll walk you through some of the common frustrations that I faced using Excel and Access to build sales forecasting reports, and how Trifacta made my life easier.
Stage one: Buried in Excel Hell
When I was an analyst, I was put on a team that was going to pilot a new forecasting program. The idea was to take retail customer data and combine it with our own internal data to increase forecast accuracy. As you can imagine, this was a tall order since none of the data we were blending together was an easy 1:1 ratio—dates, item codes, and even units of measure needed to be converted so they could easily be linked together. Additionally, the data was stored in multiple databases and the customer data had to manually be downloaded from the customer’s portal.
Although we could export most of the data into Excel files, Excel wasn’t a sustainable option to create a report that needed regular updates. The reason? It was incredibly difficult to perform VLOOKUPs across multiple files. (It was actually easier to actually copy and paste the data into different tabs on one file, but that method is not efficient.) Even if I managed to get my data in one Excel file, VLOOKUPs can only be performed on one pair of key columns, and sometimes I needed three, four, or five keys in order for the data to blend together correctly. And when I got the dreaded #N/A error in each cell? I had to troubleshoot in the same way you do with Christmas lights—when the entire string goes out, you have to test each bulb one by one to see which one is causing the problem.
Now sure, there’s a lot of tricks, add-ons, and plugins that we could have Googled and figured out how to do this all in Excel. However, the director of my department already informed my customer that we would have a visual analytics tool for them to review next week to get them on board with this program. (He promised this to the customer before talking to me, of course.) I needed this done and I needed it done now. So, we pivoted (pun intended) to using Access.
Stage two: Relearning Access
I had not used Access since college so I had to dig in my closet and blow the dust of my old Computer Science 101 textbook. Since not a lot of the people on my team had much Access experience, we knew this would be an issue going forward with onboarding new team members. Access isn’t as widely used as it once was. Sure, it could accomplish a lot of the tasks we were looking for, but it was a painful process. We could join by using multiple keys, but when you create the join, you have to query the results at scale to find out if it worked. This is pretty much the case with anything in Access. You plug in your logic (which still requires you to memorize tons of formula syntax) and then you run the query. It would take a long time, so we would typically run to Starbucks or watch Youtube videos while we waited to see if what we did actually worked. If not, rinse and repeat.
We managed to get the project done and the customer bought into the program, but managing this report each week was a nightmare. I had to dedicate almost a full day to generating the report for the dashboard. If anything needed to be changed or updated in Access, I could not work on anything else for the rest of the week. We still needed a new solution, and that’s when I discovered self-service data prep platform Trifacta.
Stage three: Discovering Trifacta, the Netflix of data prep
The very first thing I saw in Trifacta was the predictive suggestions. If I simply highlighted a pattern of string characters of numbers it gave me suggestions, in natural language, of what Trifacta thought I might want to do with the data. “Do you want to Extract this pattern into a new column?” However I interacted with my data, Trifacta would provide these suggestions. I did not have to memorize formulas, commands, or syntax—it was more like I was actually working with another human than a computer program. Cleansing and structuring my data was a breeze in Trifacta, but the next test was joining data together. Not only could I join together data with multiple keys in the join interface, but I could use inner and outer join functionality, and my keys did not have to match exactly. I could join keys together regardless of white space, special characters, and casing.
I was able to get this project finished and make it sustainable. Running the report was almost pretty much automated. I actually got better at my job because I was spending less time fixing data so I could work with it and I could focus on what was actually in the data. I became more proactive as opposed to reactive. I started to solve problems. Now that I had the process down, I could onboard new customer data in a fraction of the time it took with Excel and Access. For one report, we spent about 3 months locked in a conference room putting all of the data together. My last day at the company, I finished a similar project by lunch.