See How Data Engineering Gets Done on Our Do-It-Yourself Data Webcast Series

Start Free

Speed up your data preparation with Trifacta

Free Sign Up
Summer of SQL

A Q&A Series with Joe Hellerstein

See why SQL is Back
 
All Blog Posts

Closing the Gap in Big Data Projects

November 30, 2014

Since joining Trifacta, I’ve had the pleasure of spending time with a number of our customers hearing about how we’re helping them transform data of all shapes and sizes into actionable insight. The demand for self-service analytics on Hadoop has driven tremendous adoption of Trifacta across a variety of industries. Market leaders like Marketshare, Lockheed Martin, Orange Silicon Valley, and Autodesk see Trifacta as key to unlocking value in their Big Data investment and increasing the productivity of their data professionals. We fill the critical gap between more cost effective, scalable data storage and processing infrastructure and the broad variety of increasingly sophisticated analysis and visualization technologies that require structured, distilled, cleansed and enriched datasets to operate on.

This gap is widely recognized as one of the most troublesome issues facing businesses in their increased adoption of Big Data. In fact, this past year’s Gartner Hype Cycle for Information Infrastructure recognized data preparation as one of the most difficult challenges facing business users of business intelligence and data discovery tools. As a sponsor and presenter at the 2014 Strata + Hadoop World conference in New York City this October, we heard this a great number of times from individuals interacting with us at our booth or following one of the sessions we presented at the conference.

The highlight of the conference for me was having a few of our customers and users share the different challenges they face wrangling the variety of data that is relevant to their business. Although Lockheed Martin, Autodesk , Marketshare, and Orange Silicon Valley, all work in separate industries and deal with very different forms of data, it was nice to see how they were able to connect over the common difficulties they face in working through the process of making their data useful for different forms of analysis.

In listening to Anna Dorofiyenko of MarketShare, she describes the challenges of integrating the ever-growing variety of data that marketers (MarketShare’s clients) use to track the effectiveness of their online and offline marketing investments. With each new inbound customer dataset, her team needs to quickly familiarize themselves with the shape and content of the data in order to effectively blend it with the existing data used in their analysis – all while meeting strict SLA’s. For her team, the ability to quickly understand new datasets and make judgements on how they need to be transformed makes Trifacta a critical part of the analysis products they provide their customers.

As a government contractor, Ravi Hubby of Lockheed Martin works with an entirely different type of customer, yet faces a similar set of challenges. Ravi’s team constantly deals with customer data from a variety of unfamiliar sources and must work through the process of uncovering each dataset’s potential for analytic use and define the different ways the data needs to be transformed for each analysis. For Ravi, Trifacta plays a key role in supporting the variety of formats of his customers’ data and enables the analysts on his team to perform this preparation work more efficiently by eliminating the requirement of having to manually code each transformation.

For Charlie Crocker at Autodesk, his team’s efforts revolve around building different valuable and consistent “views of data” out of  a wide variety of the company’s raw information for Autodesk’s internal stakeholders. Their work involves interacting with Autodesk’s massive data lake where various product and business data is ingested and stored. Trifacta plays a key role in giving his team more speed and agility in how they structure, compare and link different raw data sets together to create relevant views for the variety of internal data consumers at Autodesk.

Transitioning years of business logic established in traditional database and ETL technology to a Hadoop-based environment has been the greatest challenge for Xavier Quintuna of Orange Silicon Valley. The complexity and cost of their existing data processing technologies did not fit the direction Orange was headed given their investment and future growth plans in Hadoop. As Orange transitions to a Hadoop-oriented data processing pipeline, Trifacta enables Xavier’s team to migrate the complexity of their existing business rules to Hadoop, using Trifacta’s intuitive interface that his team can quickly gain proficiency in. With Trifacta, Orange is able to benefit from the agility of Hadoop’s support of “schema on read,” without having to hire a team of expensive specialists to achieve their objectives.

We’re excited to be involved with these forward-thinking individuals as they break new ground in enabling their organizations to have more agility and efficiency with how they work with data. Whether they are wrangling hundreds of small, manually created spreadsheets, or petabytes of machine-generated data, we’re proud to help each of these companies do the hard work of making their data useful for analysis.