Working with your own data is challenging enough, but today, most organizations rely upon some form of third-party data, too. Whether it’s data from multiple retail partners to understand sales inventory planning or a variety of cancer research data to develop new clinical trials, organizations must quickly explore, structure, and combine unfamiliar data—and that’s not always easy.
In this post, we review some of the common challenges involved in working with third-party data, and how modern big data solutions are helping organizations reduce wrangling time to get to analysis faster.
Mo’ Data, Mo’ Problems: Unfamiliar Data Sources
The challenge with third-party data is that it’s not yours, which means it often arrives in a host of different formats and standardizations. From Excel to CSV to raw text files, you likely don’t get to choose how you want your data delivered, let alone how it should be formatted.
PepsiCo’s CPFR (Collaboration, Planning, Forecasting & Replenishment) team faced this challenge when working with its national retail partners, all of whom handle dates and times differently and use various field names for their sales data. PepsiCo needed this data is order to forecast sales, but their partners weren’t able to put in the extra hours to wrangle it. Similarly, NationBuilder needed to aggregate millions of voter registration data from over 3,000 counties in the United States, coming in inconsistent and poorly-formatted data sets. It was a daunting task.
In both cases, the circumstances of working with third-party data made for extensive preparation. And that’s not unusual—most data analysts report that 80% of their time is spent preparing data, which means they aren’t able to focus on analyzing the problem at hand.
Excel Isn’t Excellent For Big Data
You like Excel. We like Excel. But both PepsiCo and Nationbuilder knew that Excel wasn’t the right tool for these jobs. They were working with so much data that the extensive manual preparation in Excel was extremely time-consuming, let alone prone to errors.
But even if the time and error issues weren’t a factor, the data from retailers and researchers is being repeated and brought in weekly, monthly, quarterly. PepsiCo and Nationbuilder wanted a way to standardize and automate the process of extracting and transforming this data, without investing in expensive engineering labor. Even with engineering, hand-coding is still a one-off process that can’t be shared easily throughout their organization, with no way to alert other groups about the existing work or insights. PepsiCo and Nationbuilder wanted their analysts laser focused on their purpose: analyzing data to provide actionable insight.
NationBuilder: Aggregating Voter Data At Scale, In Near Real-Time
Some companies have a business model based entirely on the quality of their data sets and their services in analyzing that data. The data itself is the asset, and it must be deep, clean, and accessible in order for the company to maximize its market value. NationBuilder aggregates voter registration data from over 3,000 counties in the United States, which is a challenging task alone given the large, inconsistent, and poorly-formatted nature of county voter databases. With millions of records on a state, county, and city level being updated constantly, NationBuilder needs to onboard data in a way that’s fast, accurate, easy to maintain, and scalable.
By using Trifacta, Nationbuilder was able to dramatically reduce the time spent building a national voter file, while eliminating the need for custom data transformation tools, thereby empowering a much larger and less technical team to derive insights and value from its important data set.
What’s the bottom line? Your data analysts live to bring you insight. With Trifacta, that’s what they get to focus on, by quickly exploring, structuring, and integrating third party data with your own. Eliminating reliance on manual data preparation can help your organization get faster and more accurate business analysis and better business results. Trifacta helps you have more confidence in your data, which means you can finally harness the power of big data to transform your future—and our world.
To learn more about wrangling data for data onboarding, read our brief, Data Onboarding: A Survivor’s Guide To Combining Unfamiliar, Disparate Data; or download the free Principles of Data Wrangling eBook here.