Start Wrangling

Speed up your data preparation with Trifacta

Free Sign Up
Trifacta Ranked #1 in Data Preparation Market Study

Dresner Advisory Services study reviews and ranks 24 vendors

Get the Report
Schedule a Demo

Transform by Example: Your Data Cleaning Wish is Our Command

June 18, 2019

What if you had a genie for your data? Well, we believe Transform by Example gets us closer to providing just that (3 wish limit need not apply). As part of our ongoing effort to make data cleaning both powerful and intuitive, we’re introducing Transform by Example this week into our free Wrangler edition. Transform by Example is a new paradigm in data interaction: rather than directly creating data transformation steps, Transform by Example allows you to provide examples of how you’d like the  end state of your data to look, and Trifacta will figure out the steps needed to get there.

 

Transform by Example

One of the most common tasks analysts need to perform is pattern reformatting – converting multiple formats of data into a single format, by manipulating delimiters, tokens, and word lengths, while preserving semantic content. For example, suppose you have a column of phone numbers that you’d like to reformat into the common +1 ### ### #### US format.

Visualizing phone number data

Writing out data transformations to solve this task can be a time consuming and error prone process, especially because the data may have many different formats of phone numbers, as demonstrated above by the Patterns interface. Moreover, Trifacta’s intelligent suggestions may not always apply to the data types you’re trying to manipulate, especially when you’d like to create a new format not already present in your data (in this example, adding the country code +1).

On the other hand, you know exactly what you’d like your data to look like. For example, given the first record of the input column, “236.926.9604”, you know that you want it to look like “+1 236 926 9604”. Wouldn’t it be nice if you could simply provide this knowledge of the end result to Trifacta, and have it figure out the rest?

This is exactly the objective of Transform by Example. Rather than authoring transforms, you instead type out one or more examples of what you’d like your output records to look like, and Trifacta will create the transform to get you there.

Typing out an example

After entering the example on the first row, Trifacta infers exactly the kind of transformation you’re trying to do. It applies this transformation to your input column, and provides you with a preview of what your data will look like once committed. If you’re not satisfied with what Trifacta predicts, you can simply add more examples for different input records until you’re happy with the results. Finally, you can add the transformation as a step to your recipe, which can eventually be executed at scale on your full dataset.

Formatting heterogenous dates by example

Let’s take a look at another example. This start_date column above is not in the format we need for our downstream analysis. Additionally, there’s a data quality issue of having multiple different formats present here. We can tackle both of these issues easily using Transform by Example.

Under the hood, Trifacta’s algorithm uses state of the art research in string processing, machine learning and graph theory to predict the transform you’re trying to apply. We will continue to refine and extend this algorithm to handle new types of data and all types of transformations, and like the rest of Trifacta’s product, Transform by Example uses Machine Learning to actually become smarter over time as users interact with it.

We’re very excited about this feature, as it reflects our core philosophy of empowering the user to provide their expertise and knowledge while minimizing repetitive effort. Transform by Example is available now in Trifacta Wrangler.

Related Posts

Bringing a Whole New World of Connectivity to Wrangler Pro & Enterprise

Over the next few weeks, we’ll be highlighting some of the latest features we’ve added to our Wrangler... more

  |  November 13, 2017

Giddyup!
Wrangling JSON Metadata via Un-Nesting

JSON is a popular file format used to store unstructured content. Many popular databases use JSON to, for... more

  |  October 27, 2017

July ‘19 Wrangler Release — Macros and Enhancements to Transform by Example and Cluster Clean

Trifacta’s July ‘19 Wrangler release includes Macros–a new way to create repeatable bulk actions in... more

  |  July 30, 2019