Start Free

Speed up your data preparation with Trifacta

Free Sign Up
Summer of SQL

A Q&A Series with Joe Hellerstein

See why SQL is Back
 
All Blog Posts

Introducing the Trifacta Python SDK

September 14, 2021

Background

In recent years, Python has become one of the most popular object-oriented programming languages. Whether you are a beginner or an experienced programmer, Python’s simple, easy-to-learn syntax enables quick readability and integration with heterogeneous systems. This simple method of programming makes Python very attractive for scripting as well as connecting different components of software systems together. Additionally, the edit-test-debug cycle of Python makes it incredibly fast to use and easy to debug without the overhead of a separate compilation step.

Python SDK

To make it easier to use, Python allows custom SDKs to be built and used for integrating applications. Python SDKs typically include all the required binary packages needed for building and deploying business applications. Additional flexibility is provided in terms of installation in a developer mode or an end-user mode to help the intended audience to use the SDK and achieve their objectives.

Introducing the Trifacta Python SDK

We’re excited to introduce the Trifacta Python SDK to help you integrate Trifacta into your existing Python environment and data pipelines. Trifacta is the only open and interactive data engineering cloud platform to collaboratively profile, prepare, and pipeline data for analytics and machine learning. 

The Trifacta Python SDK helps you seamlessly integrate Trifacta into your data science workflows. Within your workflows, you can leverage the extensive capabilities of Trifacta to engineer your data including exploration, visualization, transformation, and preparation. Trifacta enables you to transform your raw data into usable, valuable data through a series of transformation steps that is commonly called a recipe. With the Trifacta Python SDK, you can now invoke a function within your Python environment and download the Trifacta recipe as Pandas code. This can be used outside the Trifacta environment and deployed in your data pipelines. With this flexible solution of using Trifacta anywhere, all the Trifacta capabilities can be leveraged in your own environment to help orchestrate robust data pipelines.

How it works

This section describes the steps to download your Trifacta recipe as Pandas code, use it in your Python environment, and deploy it into your pipelines. You can get started with Trifacta using our friendly introductory guide.

  • Step 1: Identify the dataset that needs to be transformed and used as clean data for your downstream applications and insights.
  • Step 2: Upload the identified dataset into your Trifacta workspace. 
  • Step 3: Using the visual, intuitive interface of Trifacta, you can define various steps to clean and transform your raw data to the required format to make it usable. These steps can be put together into a series called a recipe within the Trifacta interface.
  • Step 4: Within each step of the recipe, Trifacta uses machine learning intelligence to provide you with a visual preview of the formatted data. This gives you the opportunity to review and make any changes to the step before committing to format and get the desired output.
  • Step 5: After you have completed the recipe and committed the steps, you can run the completed flow with the transformed data to get clean, usable, data.
  • Step 6: Now that you have a ready recipe, you can invoke a function in your Python notebook to translate the recipe into Pandas code.
  • Step 7: You can then download this code to your local Python environment.
  • Step 8: Finally, you can deploy this recipe with the Pandas code into other Python pipelines and transform other datasets, as required.

You can find additional information on how to download and install the Trifacta Python SDK, along with examples on the Trifacta Python project page at https://pypi.org/project/trifacta/

Benefits

With an intuitive, visual “guide and decide” interface, Trifacta offers an easy and efficient solution compared to manual, handwritten code. This reduces the time spent on feature engineering by up to 80%. This is a significant reduction of not only time, but also increases productivity and efficiency of data analysts, engineers, and scientists.

Are you ready to reap the benefits from both the worlds of Python and Data Engineering? Sign up for a free trial today with Trifacta, and take your data science workflows to the next level with the Trifacta Python SDK.