Today, we’re excited to announce the release of Trifacta v4, which marks a huge step forward in our continued focus on putting the power of data wrangling into the hands of the people who have the most context and understanding of the data. You can read our official announcement in the press release here, and learn from some of our customers about how v4 will specifically impact the innovative work they’re doing with Trifacta.
As Trifacta’s VP of Products, the v4 release is exciting for me because it represents a dramatic enhancement to the most core element of our product—our user experience. From day one, Trifacta has focused on situating users directly within the context of their data, encouraging analysts to explore and experiment with their data in ways they were never able to with traditional tools. It’s allowed our users to drive new levels of efficiency, to ask bigger questions and uncover unanticipated insights, and, in the process, to really have fun with wrangling.
But v4 goes beyond just upgrading the experience of wrangling data in Trifacta. We have also done an extensive amount of work to improve the scale and performance of our application and the way in which we transform data sets that don’t have the scale to require parallel processing. Working closely with our customers has given us a unique vantage point; our experience has proven that data heterogeneity is a fact of life, that there will always be data outside of the lake or spread between cloud and on-premise platforms, and, in response, we’ve expanded the breadth of data sources and cloud platforms supported by Trifacta.
All of us at Trifacta are excited to share what is undoubtedly our most significant release since the company’s inception, extending data wrangling to any user, any data and any cloud.
Enhanced User Experience
Giving our users the right context to wrangle their data started with a visual experience: automatically-generated histograms, compelling data representations, and the ability to interactivity brush over and highlight particular segments of the data for suggested wrangling tasks. It was, and remains, Trifacta’s differentiating feature—no one else offers such a visual, contextual experience.
When it comes time to edit or build more difficult wrangling steps from scratch, however, we found that our less-technical users would often get stuck and have trouble determining where to start. As data wrangling becomes increasingly important to all types of users, we know that we need to continue to provide all users with the same freedom that advanced data wranglers have, and allow them to more easily prepare data within the context of what they’re doing.
The general availability of Builder does exactly that, which my colleague Giorgio Caviglia will expand upon in a later post. In essence, this new workflow ensures users are guided through each step of their wrangling tasks with the assistance of drop-down menus and intelligent in-application guidance. Powered by the same intelligence as our core Predictive Transformation interaction model, Builder learns, adjusts, and responds to the user’s changing needs, delivering Trifacta’s most intuitive and accessible workflow for crafting data wrangling recipes yet.
Providing the right context for data wrangling has also manifested in the addition of several new features, including column lineage and pattern profiling. With column lineage, users can more effectively understand how a specific column or attribute of data originated, while pattern profiling helps users explore the various structures and attributes that make up each column. v4 also enables users to build and manage end-to-end data wrangling worKflows that bring together multiple datasets, wrangling recipes and publishing formats or locations. As a whole, the new UI/UX features delivered in Trifacta v4 offer a richer, more intuitive experience that accelerates the time for novice users to become data wrangling experts.
Improved Performance & Scale
With every interaction, Trifacta suggests new transforms and creates new visual representations, gaining intelligence over time, and fueling an iterative feedback loop that allows users to adapt accordingly. But it’s not enough for users to get that feedback; they need to get it fast. Where’s the productivity gain in wrangling data if you’re constantly watching a spinning wheel to complete each operation?
With the general availability of Photon, Trifacta’s optimized compute framework, we enabled even faster performance on larger volumes of data directly within the application, which lead to huge productivity gains for users. (Stay tuned for a later post, we’ll delve into the technical details of why, exactly, we chose to build Photon and how it enhances our user experience and processing data sets that don’t require parallelization).
On the back-end, Trifacta’s Intelligent Execution Architecture leverages the context of the data being worked on to automatically select the right data processing engine for each wrangling job. We know that big data means data of any size, which refutes the “one size fits all” rule when it comes to execution architecture—why should our customers consume precious cluster resources when users are working with data that are MBs in size? Photon not only ensures user immediacy when working with data, but it also offloads from distributed cluster processing to ensure only the right jobs with enough volume and complexity are being executed on the cluster.
While Photon provides unmatched performance when wrangling data on-the-fly in the application, on a desktop or on a single server, we still provide support for parallel processing of large-scale datasets by compiling wrangle recipes to Spark (Spark 2.0 in v4) or optimized MapReduce.
Extended Cloud Deployment & Data Source Connectivity
As data wrangling use cases continue to involve a growing variety of enterprise sources, the ability to connect to common databases is increasingly critical. With v4, we’ve expanded the range of data sources we’re able to connect to by adding support for popular databases such as Microsoft SQL Server, MySQL, Oracle, PostgreSQL and Teradata. Improving our support for embedding Trifacta in various cloud environments is also a key focus of v4 as an increasing number of enterprise analytic workloads move to the cloud. With v4, we’ve expanded support for deploying Trifacta in popular cloud ecosystems Amazon Web Services, Google Cloud Platform and Microsoft Azure. While we have greatly expanded the data sources for wrangling, we know there is more data out there that we can’t reach today. For that reason, we’re also introducing a standard connectivity API that encourages customers and partners to build their own connectivity to Trifacta.
What makes these expanded deployment and connectivity options in v4 unique is not just their availability, but the fact that we’ve built them in the same vein as the rest of our product, allowing for simple and immediate access with self-service configuration and management. Again, we’re putting the user first, not only considering what data sources or deployment environments they need to be successful, but how easily they can configure them.
Meeting the Needs of Insights-Driven Organizations
It’s exciting to see data wrangling maturing over the last four years. Our customers are demanding smarter, faster and more automated functionality for repeated use across a broader set of data sources. With v4, we’re satisfying the bulk of these asks and I can’t wait to put it in the hands of every user.