Start Free

Speed up your data preparation with Trifacta

Free Sign Up
Moving Analytics to the Cloud?

Survey of 600+ data workers reveals biggest obstacles to AI/ML in the cloud

Get the Report
Schedule a Demo

Wrangling in the Azure Cloud

March 6, 2018

The world is more cloud centric than ever

2017 will be remembered as the year new applications workload will have weighted more in the cloud than on-premise. According to Cloud Security Alliance (CSA) report, in 2016, 60.9% of applications workloads were still in enterprise data centers. By the end of 2017, however, fewer than half (46.2%) will remain there. We’ve been witnessing a similar trend with an increasing number of analytics and machine learning initiatives being hosted in the cloud with the data locality balancing in and outside the cloud.

This hybrid mandate plays very well with the Microsoft Azure cloud ecosystem and can be leveraged by Trifacta wherever the data and apps are, in the cloud and/or on-premises. This has been validated by an increasing number of customers adopting Trifacta to wrangle data on Microsoft Azure, including Chubb, Donnelley Financial Solutions, and Etihad Airways. Indeed, Trifacta on Azure offers the best self-service data preparation capabilities in the market; this has been validated by the Dresner Advisory Services user survey for the fourth consecutive year in a row, as well as the 2017 Forrester Wave report, or by the users themselves in the Gartner Peer Insights where Trifacta is ranked 4.7 out of 5 with 32 reviews.  

To accelerate and provide even better benefits for Azure customers, Trifacta has been working hard to comprehensively and securely support Azure services for data-driven initiatives, adding Trifacta to the Microsoft Azure Marketplace under the intelligence, analytics and compute categories while also earning the very exclusive Microsoft Co-sell Status, which recognizes strategic joint customers and deep technical solution validation on Azure.

This transition to cloud-centric computing environments was a big reason why our team recently raised a new $48 million financing round. There is a need for more efficient data preparation in this hybrid, multi-cloud world and Trifacta is extremely well-positioned given our interoperable architecture to be the de facto solution for enterprises for wrangle data for analytics and machine learning initiatives.

Availability on Microsoft Azure Marketplace

As part of the additions we’ve made to the Microsoft Azure ecosystem, is the official launch of Trifacta Wrangler Enterprise on Azure Marketplace. This offer is optimized for Azure with support for a broad range of services to wrangle data from and to Azure Storage Blob, Azure Data Lake Store, or Azure SQL Data Warehouse. It also leverages Spark execution as part of Microsoft HDInsight platform to process the data at scale.

Wrangler Enterprise is designed for demanding data wrangling initiatives that span a large number of users and data volume at scale. It’s also for customers that wish to configure and manage their own HDInsight clusters to use with Trifacta.

Trifacta provides several different options for deploying Wrangler Enterprise via Azure Marketplace. You can choose to create a new HDInsight cluster, add Trifacta to an existing HDInsight cluster, or use a custom Azure Resource Manager (ARM) template. Based on data volume and processing needs, you can choose the appropriate resourcing required when you install via one of the methods mentioned above.

If you have less demanding data wrangling requirements with small to moderate data volume and you don’t want to manage the underlying data processing infrastructure, we also offer a hosted edition of Trifacta for individuals and teams – Wrangler Pro. You can find the best Trifacta edition that works for you here.

Trifacta Architecture on Microsoft Azure

Fig 1 – Typical deployment architecture of Trifacta on Microsoft Azure

Trifacta integrates natively with several components and services that are part of the Azure Cloud Platform. Most importantly it takes into consideration key security requirements to ensure data access and processing meet strict Enterprise governance standards and protocols.

Storage (Azure Data Lake & Windows Azure Storage Blob)

You can wrangle data stored either in ADLS or WASB using Trifacta. These storage services provided by Azure allow a large variety of use cases to be supported. Combined with the security framework described below, data access is always secure.

Analytics Store (SQL Data Warehouse)

Once data is wrangled in Trifacta, it can be made available to a variety of downstream analytics platforms and applications. Azure SQL Data Warehouse is the most popular platform for interoperating with analytics applications such as Power BI, Tableau and Qlik. Trifacta allows for read and write access from and to SQL Data Warehouse via either JDBC (small/medium sized data) or Polybase (larger volume data) interfaces.

Data Processing (Photon or Spark via HDInsight)

Whether your data volume is GB, TB or PB, Trifacta can easily wrangle them all on Azure by leveraging different compute engines that’s best suited for the workload.  For small to medium data volumes, Trifacta’s unique Photon in memory compute framework is made available within the application running on Azure. For larger volumes, Trifacta integrates natively with Apache Spark running on latest HDInsight v3.6.

Security (SSO, Domain Joined Cluster)

Trifacta Wrangler Enterprise supports secure data access to all the resources provided on Azure via various SSO technologies.  By default, you can authenticate through Azure Active Directory (Azure AD), a fully cloud enabled directory service offered by Microsoft. You can also integrate your existing LDAP directory services to that of Azure AD and fully leverage secured access to Trifacta.  

For full enterprise security support, you can also choose to configure your HDInsight cluster to be a domain joined cluster, where it’s part of your Active Directory Domain. Trifacta supports accessing and running wrangling jobs against a domain joined cluster.  For secured Hive access, Trifacta also supports Apache Ranger in conjunction with HDInsight.

Trifacta is Azure Co-sell Ready

This comprehensive support of Azure data services and the increasing customer adoption on Azure drove Microsoft’s attention to certify Trifacta as a Microsoft co-sell partner. Co-sell status indicates not only a certain level of large strategic joint customers but also a deep technical due diligence Microsoft conducted reviewing Trifacta’s solution on Azure. For customers this means the joint solution has been tested at some of the world’s largest enterprises as well as deeply reviewed by Microsoft Azure experts. This extreme level of vetting can ensure your organization can have confidence rolling out Trifacta on Azure across your organization.


Want to learn more about running Trifacta on Azure? Here are some additional resources to check out:

Related Posts

New Features in Trifacta Wrangler

Today we are excited to introduce to you a new round of updates to Trifacta Wrangler! One of the benefits of... more

  |  March 3, 2016

New Features Now Live in Wrangler!

We just updated Wrangler, our completely free cloud-based application, and are excited to share how this new... more

  |  November 27, 2017

Best-of-Breed Data Cataloging and Data Wrangling: A Match Made in Heaven

At Trifacta, we’re focused on data wrangling, or the process of converting diverse, raw, messy data into a... more

  |  June 8, 2017