Class is Now in Session

Presenting The Data School, an educational video series for people who work with data

Learn More

Integrating Unfamiliar Data: Trifacta vs. Excel for Analytics Providers

March 24, 2020

The Challenge

Your company provides analytics as a service to your customers. Each time you bring on a new client, it kicks off the process of gathering the client’s data, getting to know the formats and accessing the contents of the data, preparing that data for analysis by cleaning and standardizing problem areas, validating the output, and then uploading that data to your platform to provide valuable insights back to the customer. Far and away the biggest challenge throughout this process of on-boarding a new client’s data, is understanding and preparing the data. Not only is this process painful and inefficient but it’s also the single biggest limiting factor to scaling your operations and becoming a more efficient business. Sound familiar? 

Many teams rely on Excel to do this work. The customer sends data over sftp and then your team opens the various datasets in Excel and begins the time-consuming process of scrolling  through the contents of the data, identifying data quality issues, understanding the columns and column formats available, and then starting to blend, clean, and reformat that data to the exact spec needed for your analytics platform. After doing this several times with a new customer, you start to get comfortable with the client’s data; but it still requires a manual and tedious process each time you need to update the analytics with the new batch of data. Customers with larger data are particularly frustrating as each new step in Excel takes seconds or minutes to load, or crashes the whole application. 

Additionally, this requires your team to go through the same preparation steps each time your client wants a refresh of their analytics. It’s extremely time consuming to onboard this data the first time, but you only save a little bit of time through familiarity when you have to repeat this process over and over again. Many teams who rely on Excel have to dedicate a large percentage of each of their team members’ work days to repeatedly preparing data in Excel. This manual process has led many analytics providers to seek tools to automate this process. Most common of which are programming tools like Python. For each new customer, Python allows the implementation teams to build a script that can then be automated and used for each new batch of data. Trouble is, Python requires a lot of technical expertise, and it can be difficult even for seasoned users to understand the contents of their data, which when working with a new customer is essential. You might think to pass this scripting work off to the engineering team, but they are already busy with software development and cannot possibly implement a new client in time to satisfy the customers needs.

So, what do you do? 

Trifacta provides the ease of use of tools like Excel paired with the automation capabilities of tools like Python–with added visualizations and data quality information to make discovering the contents of one’s data quick and easy. Each step created in Trifacta during the preparation process is stored in a recipe, which can be scheduled to create a new self-service data pipeline for each customer. Trifacta accelerates the time it takes to discover the contents of each customer’s data, prepare that data, and publish that data to the analytics product, but the real value and time savings comes from automating the work done in Trifacta. Once a recipe is created to take a customer’s data from raw to refined, that recipe can be scheduled to run hourly, daily, weekly, or whatever time interval is needed to ensure that your customers are getting up to date analytics whenever they need it. 

Interested in trying Trifacta for yourself? Sign up for free today.

Related Posts

Network Optimization in Today’s Telecommunications Industry

Data Wrangling Makes Expected Performance and Deep Insights Possible for Network Optimization The future... more

  |  April 28, 2017

Why Excel & Access are the VHS Tapes of Data Prep

While sitting in a cubicle, doing the kind of work you would expect I’d be doing in a cubicle at a large... more

  |  February 22, 2018

Trifacta + Cloudera: Empowering Consumer Packaged Goods Organizations to Drive Innovation with Data

The consumer packaged goods (CPG) industry is notoriously volatile, with razor-thin profit margins and high... more

  |  May 5, 2016