Trifacta v2 Simplifies Hadoop Analytics
New release radically increases analyst productivity when wrangling data of all shapes and sizes
San Francisco, CA – October 9, 2014 – Trifacta, a leading Data Transformation Platform provider, today announced the release of the Trifacta Data Transformation Platform v2. This release extends Trifacta’s leadership in simplifying data preparation for Hadoop by:
- Introducing advanced visual data profiling capabilities that guide users through a deep understanding of the characteristics of any data set.
- Providing native support for more complex data formats, including JSON, Avro, ORC and Parquet.
- Leveraging the multi-workload processing features of Hadoop to scale data transformation processing seamlessly from small to big data through native use of both Spark and MapReduce.
Advanced Visual Data Profiling
Before data scientists, IT programmers, and business analysts can start to manipulate their data for analysis, they must work through the time-consuming challenge of profiling to ensure the fit and accuracy of data for analysis. To date, this process has largely consisted of manual testing and programming. Trifacta v2 uses a combination of machine learning and interactive data visualization techniques to automatically evaluate the distribution and statistical relevance of data and provide analysts with immediate visibility into unique elements of the data set like data distributions, gaps in data collection, and unusual skew of the data.
“Visualization used to be the end goal of analysis,” said Anna Dorofiyenko, VP of Data at MarketShare. “But with platforms like Trifacta, we’re seeing that visualizing data as part of the data transformation process can dramatically speed data preparation. Our analysts are finding that in cases where we used to get two weeks into data processing and find that the data was not fit for modeling, Trifacta is able to tell us that in two seconds.”
Complex Data Format Support
In the world of Hadoop, where data is multi-format in nature, human processing of raw text, semi-structured, and tabular data presents a significant challenge to getting the data analysis process off the ground. Determining the shape of the data and translating that shape into human readable formats is often a barrier to even starting analysis. For example, nested data structures like JSON can be challenging to navigate manually and have become a more standard data source for analysis of application and web data. Trifacta has continued to increase the variety of data shapes that it supports, announcing JSON support last month.
The Trifacta v2 release removes the bottleneck for the analyst of parsing Hadoop-specific data storage formats, by automating the interpretation of these formats. Trifacta v2 introduces support for both the input and output of the serialized formats Avro and ORC and the columnar format Parquet.
Trifacta’s ability to ingest, transform and write back out JSON, Avro, ORC and Parquet data means that Trifacta transformed data is immediately accessible through a wide variety of Hadoop’s SQL-access frameworks, including Stinger, Apache Drill and Impala.
Spark & MapReduce Multi-workload Processing
In addition to transforming data of any shape, Trifacta v2 seamlessly optimizes data transformation processing for data of any size. Trifacta’s domain-specific language (DSL) architecture seamlessly and automatically matches the size of the data for transformation with the data processing engine best suited for the workload. With Trifacta v2, this means support for both MapReduce batch processing and interactive Spark processing. Customers are not required to understand high level languages like PIG or Scala, or choose between Hadoop execution frameworks like Spark or MapReduce. Using the Trifacta DSL, called Wrangle, transformation logic is defined once and then automatically translated into native code for the Hadoop processing framework best suited for the data set being transformed.
“Trifacta is helping our customers extract maximum business value from both their Hadoop investment and more importantly their data stored in that Hadoop cluster,” said Tim Stevens, vice president, Business and Corporate Development, Cloudera. “Trifacta’s investment in integrating with engines like Spark and Impala help drive innovation in the data preparation space and provide a uniquely differentiated solution to market.”
With Hadoop use expected to grow 25x by 2020, according to Allied Research, and with organizations using Hadoop looking at an average of 8-10 different data sets in each analytic process, self-service data preparation has become a significant enabler to extracting value from Hadoop.
“Trifacta has embraced the Hadoop stack, with native support for Hadoop standards like HCatalog and Spark, as well as common data compression formats like Avro, ORC and Parquet,” said Joe Hellerstein, CSO of Trifacta. “We understand that open standards drive the Hadoop ecosystem, and we are committed to remaining in sync with community standards and innovations. We’re excited to see our technology supporting some of the largest Hadoop deployments in the world, while also giving a range of customers an easy way to use Trifacta on typical data sets at smaller scale points.”
Trifacta v2 will be previewed at Strata + Hadoop World New York 2014. Agile Data Transformation will be the topic of a customer panel hosted by Trifacta at Strata + Hadoop World New York 2014 and will include speakers from LinkedIn, MarketShare, Autodesk, and Orange on October 17 at 1:45pm.
Trifacta will also be jointly hosting a joint webinar with Cloudera on the topic of Transforming Data of All Shapes and Sizes for Pervasive Analytics on November 12.
- Getting Big Data to Work: Agile Data Transformation in Hadoop Panel
- Register for the webinar with Cloudera & Trifacta
- Learn more about Trifacta
- Read more on the Trifacta blog: https://www.trifacta.com/blog/
- Follow us on Twitter: https://twitter.com/trifacta
- Become a fan on Facebook: https://www.facebook.com/Trifacta
- Connect on LinkedIn: http://www.linkedin.com/company/trifacta
Trifacta, the pioneer in data transformation, significantly enhances the value of an enterprise’s Big Data by enabling users to easily transform raw, complex data into clean and structured formats for analysis. Leveraging decades of innovative work in human-computer interaction, scalable data management and machine learning, Trifacta’s unique technology creates a partnership between user and machine, with each side learning from the other and becoming smarter with experience. Trifacta is backed by Accel Partners, Greylock Partners, and Ignition Partners.
Nolan Necoechea for Trifacta