Start Wrangling

Speed up your data preparation with Trifacta

Free Sign Up
Trifacta Ranked #1 in Data Preparation Market Study

Dresner Advisory Services study reviews and ranks 24 vendors

Get the Report
Schedule a Demo

Introducing Trifacta’s Support for Azure Databricks

November 6, 2018

The past several years have seen huge growth in cloud computing environments. Of the major cloud providers, Microsoft Azure is recognized as the fastest-growing cloud with a growth rate as high as 98% over the past 12 months. Trifacta customers like Etihad, Kaiser, and Valley National Bank leverage Microsoft Azure as their platform of choice in large part because of its elastic computing, which allows them to quickly expand or decrease processing and storage in response to shifting demands. In addition, Azure’s deep expertise in hybrid cloud capabilities and comprehensive AI services position it as one of the leading cloud platforms.

At the same time, data is moving faster. By 2020, analysts predict that the amount of data generated will have grown by 50x since 2010, and this rate will only continue to accelerate. The true competitive nature of data doesn’t necessarily lie in how much data organizations can leverage, but rather how quickly they can respond to it. To that end, the Apache Spark™ engine has been instrumental in accelerating how organizations process data—while also responding to organizational concerns around security and governance —and ultimately allowing organizations to leverage their data faster. And as more and organizations move to the cloud, Databricks has been the leader in providing a native Apache Spark-based computing platform in the cloud. Azure Databricks, the Apache Spark–based analytics service hosted on Microsoft Azure, offers organizations that are on the Azure platform this processing power, as well as increased flexibility with its elastic and consumption-based architecture.

Increased storage and processing power is one side of the equation. But deriving value from data also means making it accessible to those who have the right context for the data. This is where data cleaning and data preparation are a critical part of  analysis, which is still widely reported as the biggest bottleneck in any analytics project. Data preparation often accounts for up to 80% of the time and resources. Adopting improved data preparation practices and solutions is key to maximizing the investment of a cloud platform and Spark–based analytics service.

Trifacta + Azure Databricks

Trifacta already maintains expansive support for the Microsoft Azure data services ecosystem including the availability of Wrangler Enterprise in the Azure Marketplace, as well as existing support for deployment on Microsoft Azure HDI and integration with Azure Blob Storage, Azure Data Lake Storage, and SQL Data Warehouse. With the Azure Databricks integration, Trifacta solidifies its position as the data preparation platform with the broadest support for Azure.

Trifacta integrates with Azure Databricks by translating the transformations that a data analyst or data scientist develops in our application and translates those transformations into runtime Spark code that executes via Azure Databricks. With full elasticity support on Azure, the user or admin doesn’t have to provision execution resources or clusters ahead of time, the size of the Databricks computing cluster will grow and shrink automatically based on the parameters of the transformation job including data volume and transformation complexity. This allows economy of scale and reuse of cluster resources when no jobs are running.

 

On the security front, both Trifacta and Databricks support SSO standards on Microsoft Azure, each job is authenticated via a special application token that can be granted and revoked via the Databricks console, and it’s tied to the individual user that has authenticated through the application so full job lineage, governance and traceability is possible.

What’s next

The Azure Databricks integration is a significant achievement in our commitment to support data preparation workloads on the Azure platform and across all major cloud platforms. We’re excited to continue strengthening our partnerships with Microsoft Azure and Databricks and to support all data preparation users, wherever their data lives.

To learn more about Trifacta, try out our free Wrangler edition by signing up here.

Related Posts

December ‘18 Wrangler Release – Folders

At Trifacta, we are always eager to release new features to our user community. That’s why we are excited... more

  |  December 17, 2018

July ‘19 Wrangler Release — Macros and Enhancements to Transform by Example and Cluster Clean

Trifacta’s July ‘19 Wrangler release includes Macros–a new way to create repeatable bulk actions in... more

  |  July 30, 2019

New Features in Trifacta Wrangler

Today we are excited to introduce to you a new round of updates to Trifacta Wrangler! One of the benefits of... more

  |  March 3, 2016