The Rise of Data Wrangling
Of the many steps required to accurately analyze big data, none is more time-consuming than data preparation. In fact, 80 percent of the analysis process is spent preparing data ahead of any type of analysis. Attempts to solve the problem of data preparation first began in academic circles. Over 20 years, teams at UC Berkeley and Stanford developed new ways to prepare, or wrangle, data for analysis. That early work paved the way for commercial ventures like Trifacta that were built specifically to wrangle and connect with a constantly growing variety of data sources.
With the increasing pervasiveness of big data, technology analysts are focusing more on trends in the data preparation market. The Forrester Wave Report, an industry leader in vendor evaluation, just devoted a Q1 report to data preparation vendors—its first. Dresner Advisory Services released its third annual End User Data Preparation Market Study in Q1 2017, ranking Trifacta #1 End User Data Preparation Vendor for the 3rd consecutive year. Bloor Research has been reporting on data wrangling since 2014, and released its most recent report on Self-service Data Preparation in April 2016.
Life before Data Wrangling
Before data wrangling solutions like Trifacta hit the market, firms relied upon Excel, hand-coded validation scripts, or ETL (extract, transform, load) processes to complete data preparation. With the advent of big data and cloud computing, all three methods were required to get the job done. Even though businesses were spending 80% of their time on data preparation, they didn’t invest in data wrangling tech because it simply didn’t exist.
Many enterprises utilized SAS—a powerful data science and statistical analysis tool—for both data preparation and analysis. Over the years, these organizations built up years-worth of SAS code that is expensive to write and maintain, difficult to track lineage from, and near impossible to share with non-technical business users. The Royal Bank of Scotland leveraged SAS for data preparation, spending years honing transformation scripts. In fact, 70% of the Royal Bank of Scotland’s SAS code was written just for data preparation purposes. As their customer service moved online, these SAS scripts couldn’t handle the large volumes of web data; it would take one month to analyze only 1% of chat data. This inability to analyze 99% of web chat data cost RBS millions of pounds in customer churn, unidentified complaint trends, and missed upsell/cross-sell opportunities.
Data Wrangling for SAS Optimization
There’s a reason SAS has been a market leader in statistical analysis and business intelligence for the last 40 years. It’s great at data science and statistical analysis. But it’s not practical for data preparation. Modern, efficient data preparation solutions, like Trifacta, now allow firms to hasten data wrangling so that SAS can be leveraged to do the sophisticated analytics it was created to do.
By integrating Trifacta into their analytics ecosystem ahead of SAS analysis, RBS was able to quickly explore and prepare 100%—not just 1 percent—of complex customer web data. Insights from this data led the firm to better identify and meet customer needs, substantially increasing their Net Promoter Score and saving the firm an estimated $3-4 million pounds annually.
Trifacta makes data preparation more precise, efficient, and intuitive. Optimize your SAS investments with Trifacta data wrangling. Download this brief to learn more.