Start Free

Speed up your data preparation with Designer Cloud powered by Trifacta

Free Sign Up
All Blog Posts

Trifacta for Data Engineers: Deployment Manager

December 4, 2018

Recently, we announced new functionality to support data engineers within growing data operations (DataOps) practices. We’ve already discussed how new functionality RapidTarget and Automator are allowing data engineers to set a predefined schema target and intelligently manage the scaling, scheduling and monitoring of data prep workflows into production. Now, we’re excited to talk a little more about Deployment Manager, or a framework for testing and managing data prep workflows in a software development lifecycle.

Introducing Deployment Manager

In order to ensure the reliability and accuracy of data preparation flows, data engineers must undergo vigorous testing prior to promoting workflows to production and versioning them. With data engineers putting in the work ensures that analysts using Trifacta to clean, structure and blend data will see their hard work multiplied through productionalized flows. Often, this means managing data prep recipes and flows across multiple environments – development, staging or testing and production. It’s also common for data engineers to utilize policies and tools as part of an overall software development lifecycle (SDLC) framework.

In Trifacta, the Deployment Manager helps data engineers and data analysts collaborate closely through this process.

To start, analyst or data engineers can develop a set of recipes in a flow, and use the Export Flow function to download a deployment package file. This file can be annotated with additional information by the analyst or the data engineers that are responsible for deployment. It can also be integrated to 3rd party version control systems such as Github or SVN for source control management.

Next,  data engineers or anyone who has been assigned a “deployment” role, can retrieve this package either from a source control system, or using the appropriate Trifacta API. In a staging or production environment, which can be completely a complete separate or distinct environment from a network perspective, the package can be reintroduced into a new or existing deployment bundle.  Any environment specific information such as data source and connection metadata can be remapped in the user interface or through the API, as what’s in development environment might be very different from that of staging or production.

Deployments are automatically versioned in the Trifacta repository, each new import will increment the version of the deployment bundle, and update its version history. A deployment bundle once deployed and marked as active, can be executed via the user interface or the API, which will execute all flows within the deployment.

If for any reason, data engineers are not satisfied with the new version or new deployment, a previous version can be marked as active and in essence rolled back to previous state before the new deployment was done. The deployment admin can even re-export the exact version in staging or production to be sent back to the data analyst for re-examination, or simply point to the appropriate version that was stored in the external 3rd party version control system. This flexibility allows data engineers and analyst to choose the best model that works in their specific scenario.

As mentioned, most of these actions of deployment manager can be done through the user interface, but a lot of users may choose to leverage the APIs instead.  It’s common practice in software development lifecycle best practices to automate as much of it as possible and set up monitoring accordingly. Trifacta provides a full set of deployment APIs to help users export, import and deploy their data prep flows across multiple environments. It’s a critical piece of the advanced operationalization functionalities that Trifacta provides.

What’s Next

With the launch of Deployment Manager—and all of the latest functionality specifically geared toward data engineers—we’re excited to see organizations expand their data preparation efforts to new heights. Repeatable and scalable processes are key for data engineers who are, under the broader umbrella of DataOps, constantly looking for ways to increase the velocity, quality and reliability of analytics.

All of the features mentioned in this blog series—RapidTarget, Automator, and Deployment Manager—are immediately available in our Enterprise edition. If you’d like to see them for yourself in action, schedule a demo with our team.