Schedule a Demo

Product

Introducing Trifacta’s Support for Azure Databricks

< Back to Blog
 
November 6, 2018

The past several years have seen huge growth in cloud computing environments. Of the major cloud providers, Microsoft Azure is recognized as the fastest-growing cloud with a growth rate as high as 98% over the past 12 months. Trifacta customers like Etihad, Kaiser, and Valley National Bank leverage Microsoft Azure as their platform of choice in large part because of its elastic computing, which allows them to quickly expand or decrease processing and storage in response to shifting demands. In addition, Azure’s deep expertise in hybrid cloud capabilities and comprehensive AI services position it as one of the leading cloud platforms.

At the same time, data is moving faster. By 2020, analysts predict that the amount of data generated will have grown by 50x since 2010, and this rate will only continue to accelerate. The true competitive nature of data doesn’t necessarily lie in how much data organizations can leverage, but rather how quickly they can respond to it. To that end, the Apache Spark™ engine has been instrumental in accelerating how organizations  process data—while also responding to organizational concerns around security and governance —and ultimately allowing organizations to leverage their data faster. And as more and organizations move to the cloud, Databricks has been the leader in providing a native Apache Spark-based computing platform in the cloud. Azure Databricks, the Apache Spark–based analytics service hosted on Microsoft Azure, offers organizations that are on the Azure platform this processing power, as well as increased flexibility with its elastic and consumption-based architecture.

Increased storage and processing power is one side of the equation. But deriving value from data also means making it accessible to those who have the right context for the data. This means cleaning and preparing that data for analysis, which is still widely reported as the biggest bottleneck in any analytics project, often accounting for up to 80% of the time and resources. Adopting improved data preparation practices and solutions is key to maximizing the investment of a cloud platform and Spark–based analytics service.

Trifacta + Azure Databricks
Trifacta already maintains expansive support for the Microsoft Azure data services ecosystem including the availability of Wrangler Enterprise in the Azure Marketplace, as well as existing support for deployment on Microsoft Azure HDI and integration with Azure Blob Storage, Azure Data Lake Storage, and SQL Data Warehouse. With the Azure Databricks integration, Trifacta solidifies its position as the data preparation platform with the broadest support for Azure.

Trifacta integrates with Azure Databricks by translating the transformations that a data analyst or data scientist develops in our application   and translates those transformations into runtime Spark code that executes via Azure Databricks. With full elasticity support on Azure, the user or admin doesn’t have to provision execution resources or clusters ahead of time, the size of the Databricks computing cluster will grow and shrink automatically based on the parameters of the transformation job including data volume and transformation complexity. This allows economy of scale and reuse of cluster resources when no jobs are running.

 

On the security front, both Trifacta and Databricks support SSO standards on Microsoft Azure, each job is authenticated via a special application token that can be granted and revoked via the Databricks console, and it’s tied to the individual user that has authenticated through the application so full job lineage, governance and traceability is possible.

What’s next

The Azure Databricks integration is a significant achievement in our commitment to support data preparation workloads on the Azure platform, and across all major cloud platforms. We’re excited to continue strengthening our partnerships with Microsoft Azure and Databricks and to support all data preparation users, wherever their data lives.

To learn more about Trifacta, try out our free Wrangler edition by signing up here.