Blog content originally posted at hortonworks.com
The most commonly reported use of Hadoop today is data transformation. After standing-up a cluster and scaling it for raw data collection, organizations begin the hard work of preparing various types of data for new analytic use cases.
In many Hadoop implementations this process has slowed wider enterprise adoption of Hadoop. Hand coded scripts to transform and prepare data in new languages provide a steep barrier to the analyst and often leave their business context out of the Hadoop data transformation process. The quote above is typical. Yet, without getting past data wrangling, it is difficult if not impossible to achieve the full business advantage of your Hadoop cluster. Successful, broad-based adoption of Hadoop comes from unlocking new value out of data. As Shaun Connoly recently shared:
“We’ve seen a repeatable pattern of adoption that starts with a focus on a new type of data and creating a targeted application. These new applications are typically driven by line of business and start with one of the following new types of data: Social Media, Clickstream, Server Logs, Sensor & Machine Data, Geolocation Data, and Files (Text, Video, Audio, etc.). Ultimately deploying more applications leads to a broader modern data architecture. But the successful customers started their journey by unlocking value from specific types of data and then rinsing and repeating from there.”
Both Trifacta and Hortonworks share a commitment to accelerating Hadoop adoption. We’re both focused on improving data analysis and changing the way people work with data. And we’re starting to see it happen.
What we’ve seen over the last year is that organizations that were successful in proving business value on Hadoop drove their implementations using transformation for a purpose, in other words, they had a specific analytic in mind that drove the transformation work. Today there’s a quicker way to achieve purpose-driven transformation and scale the rinse and repeat model. Trifacta is one among a handful of vendors pioneering the new space of Data Transformation.
Our unique approach to this is called Predictive InteractionTM. Predictive InteractionTM tools replace tedious manual coding of data transformation scripts with a lightweight interaction for the user, grounded by concrete data transformation scripts generated by the software. This new interactive technology transforms low-level programming tasks into high-level visual interaction in the following interactive loop:
· Visualize: Intelligent visualization software presents the user with overviews and details of their data, including examples of raw content, and charts that are automatically derived from the data.
· Interact: The user interacts directly with data and charts in the visual interface, highlighting values, structures or trends of interest.
· Predict: The user is presented with a ranked list of the data transformation predicted by the algorithms, and can quickly browse visual previews of the outcome of each suggested data transformation to choose or adapt the best suggestion.
The result is that users can transform data quickly without the tedious manual process of hand-coding each and every transformation step.
We’re excited to announce our partnership with Hortonworks formally with the certification of Trifacta on Hortonworks Data Platform (HDP). Combining the reliability and stability of HDP with Trifacta’s unique Predictive Interaction™ solution allows users to create concise data transformation scripts that compile natively to Hadoop and are designed to scale. Together with Hortonworks we will certify, support and deliver the Trifacta platform to the HDP community, ensuring the ongoing compatibility of Trifacta’s Data Transformation Platform to Enterprise Apache Hadoop users. With the help of the community, we’re looking forward to driving more enterprise adoption of Hadoop together.