Trifacta’s Intelligent Execution architecture adds new Photon in-memory processing engine to complement support for Apache Spark and MapReduce
San Francisco, CA – March 29, 2016 – Trifacta, the global leader in data wrangling, today announced the Photon Compute Framework, a technology enhancement at the core of its award-winning data wrangling interface. Photon was developed specifically to provide Trifacta’s users with a richly interactive and computationally intelligent data wrangling experience on large in-memory datasets. Architected to be Apache Arrow compliant, Photon powers both Trifacta’s user experience, and complements Trifacta’s support for popular open source distributed data processing frameworks such as Apache Spark and MapReduce.
“The marriage of design and technology is what makes breakthrough user experiences possible,” said Joe Hellerstein, co-founder and chief strategy officer at Trifacta. “Photon provides a huge boost to the immediacy and scale of our interfaces for data exploration and transformation. Building on new concepts in high-performance, data-centric computing from both research and industry, Photon powers an unprecedented, immersive user experience. Trifacta users can now explore, profile and transform serious volumes of data with instant feedback from a host of machine learning algorithms and visualizations. The user experience that Photon enables is completely unique in its scale, richness and immediacy.”
Photon incorporates new technologies from high-performance in-memory compute frameworks and embeds them directly into the Trifacta interface. Photon is architected to underpin Trifacta’s immersive data wrangling experience, enabling what is widely regarded as the most intuitive and efficient workflow for preparing data for analysis. With Photon, Trifacta users receive the following benefits:
- At-Scale Immediacy Increases Productivity – With Photon, users receive immediate feedback when interacting with the content of their data, and are never removed from their workflow or forced to wait for processing to complete. Whether operating on complete in-memory datasets or samples of big data, users are able to interactively wrangle data volumes that are orders of magnitude larger than was previously possible.
- Enhanced Performance Drives Better Intelligence – Like intelligent programs for Chess and Go, Trifacta constantly anticipates the next moves that a data analyst might want to make when wrangling data. Photon’s in-memory engine allows Trifacta to explore this space instantly with orders of magnitude more data and computation than previously possible, ranking suggestions and presenting them to users with rich visualizations of potential outcomes.
- Lightweight Footprint Enhances Portability– Photon provides all of the performance features offered by modern in-memory processing frameworks such as multi-threaded computation, LLVM compilation, columnar layout and pipelined data processing, yet only requires a minimal memory footprint. As a result, Photon runs natively and directly within the browser and in single-node environments. Additionally, Photon snaps into Trifacta’s Intelligent Execution architecture as the perfect complement to more resource-intensive distributed computing frameworks, like Spark and MapReduce, that Trifacta supports for big data processing.
“While working with some of the world’s largest datasets during my time at LinkedIn, I found that the ability to rapidly prototype and interact with real data at scale was critical to iterate on ideas and algorithms,” said Pete Skomoroch, data scientist and entrepreneur. “With Photon, analysts can operate in an immersive data wrangling experience that provides instant feedback on large datasets in-memory. This is already a best practice in data science, but Trifacta now puts this capability directly into the hands of anyone who works with data.”
Data analysts and scientists are most efficient and effective when they receive immediate feedback on interactions with their data. Disruptions in the data wrangling process not only slow the end-to-end preparation work, they can also inhibit the adoption of end-user tools for data preparation because users are forced to struggle with a frustratingly slow and inefficient workflow. Photon was developed to eliminate these disruptions and deliver a fluid data wrangling experience for data at any scale.
As part of the development of the Photon Compute Framework, Trifacta collaborated with leading technology organizations, including Cloudera and Databricks to make Photon compliant with Apache Arrow. This effort ensures Photon’s interoperability with present and future innovations from the open source ecosystem and allows Photon to leverage Arrow’s benefits in accelerating the performance of analytical workloads by more than 100 times in certain use cases.
“Trifacta’s Photon team was an early participant in the design of the open-source Apache Arrow project,” said Marcel Kornacker, tech lead at Cloudera. “Arrow is a new in-memory columnar data structure to standardize in-memory processing and interchange across the ecosystem—a community effort that includes our team at Cloudera as well as developers from Amazon, Databricks, Dremio, MapR, Trifacta and Twitter. Arrow’s efficient design will accelerate workloads connecting frameworks like Photon to Hadoop frameworks (including Impala and Spark), and enable native interoperability for languages like Python and R for better data access and high-performance analytics.”
See Photon in Action
Trifacta will unveil Photon at Strata + Hadoop World in San Jose in the session Architecting immediacy: The design of a high-performance, portable wrangling engine.
The company has a total of three sessions at Strata + Hadoop World in San Jose from Tuesday, March 29 through Thursday, March 31 and will be offering live demos and consultation throughout the show at booth #831.
Trifacta at Strata
- Wrangling, metadata, and governance: Supervision vs. adoption
- Date: Wednesday, March 30
- Time: 11:50am – 12:30pm PT
- Location: LL21 A
- Speakers: Wei Zheng (Trifacta); Mark Donsky (Cloudera); Mohan Sadashiva (Waterline Data Science)
- Grounding big data: A meta-imperative
- Date: Wednesday, March 30
- Time: 1:50pm – 2:30pm PT
- Location: 210 D/H
- Speakers: Joe Hellerstein (UC Berkeley/Trifacta); Vikram Sreekanti (Berkeley AMP Lab)
- Architecting immediacy: The design of a high-performance, portable wrangling engine
- Date: Thursday, March 31
- Time: 1:50pm – 2:30pm PT
- Location: LL21 E/F
- Speakers: Joe Hellerstein (UC Berkeley/Trifacta); Seshadri Mahalingam (Trifacta)
- Read Joe Hellerstein’s blog about the Photon Initiative
- Learn more about Trifacta
- Follow us on Twitter
- Become a fan on Facebook
- Connect on LinkedIn
Trifacta, the global leader in data wrangling software, significantly enhances the value of an enterprise’s big data by enabling users to easily transform and enrich raw, complex data into clean and structured formats for analysis. Leveraging decades of innovative work in human-computer interaction, scalable data management and machine learning, Trifacta’s unique technology creates a partnership between user and machine, with each side learning from the other and becoming smarter with experience. Trifacta is backed by Accel Partners, Cathay Innovation, Greylock Partners and Ignition Partners.
Nolan Necoechea for Trifacta