Start Free

Speed up your data preparation with Designer Cloud powered by Trifacta

Free Sign Up
All Blog Posts

Visual Profiling for Data Transformation

October 14, 2014

How do you feel when encountering a data set for the first time? Perhaps you may feel the dread of the unknown: what scripting, cleaning and interpretation hurdles await? Or, you may find that the data is so large it renders common tools like spreadsheets useless. Over my career working in data, I have felt these anxieties far too many times when starting out with a new dataset. But this is not how it should be! Ideally, encounters with new data should bring the child-like excitement of opening a present: unboxing the content with eager anticipation and exploring what lies inside.

At Trifacta, we’re focused on fundamentally changing the experience of data transformation and providing delightful “first-touch” experiences with data. This means more than transforming (shaping, enriching) data. It means creating shareable, reusable processes and helping users get to know the shape and structure of their data. When done well, the process of data transformation lays the foundation for successful and repeatable analyzes.

Towards this goal, we’re excited to announce new capabilities in Trifacta v2 that make analysts even more productive: Visual Profiling. Building on our Predictive Interaction™ approach to data transformation, Visual Profiling provides dynamic, interactive visualizations to help you see, understand and improve your data.

Visual Profiling first analyzes your data to automatically choose visualizations designed to support data quality assessment and transformation. For example, look at the time series data above. Based on the data type (timestamps) and distribution (variation over days, months, and years — but not hours or minutes) we present a collection of helpful visualizations: a timeline histogram (note the increasing activity at the end of each quarter) as well as summaries by month, day of week (less activity on weekends), and day of month (more activity on the last day of a month). Rather than forcing a user to manually build up each chart from scratch, Trifacta automatically presents a visual tour of relevant data.

Or, consider this example of zip code data. Trifacta automatically recognizes zip codes as a data type and uses that type information to present the data as an interactive map. Still, the underlying data is represented as strings (text), so the histogram display shows the distribution of string lengths. The histogram immediately reveals that our data has both 5 and 9 digit zip codes. If we click the bar for 9 digit codes, Trifacta’s Predictive Interaction™ technology kicks in to suggest possible transformations. Here, we can standardize the data by mapping all 9 digit zip codes to 5 digits. The goal of Visual Profiling is to use visualizations to help you understand your data and, just as importantly, provide a springboard for further correction and refinement.

The description above only begins to scratch the surface of what we’re looking to achieve with Visual Profiling for data transformation. Trifacta Visual Profiling can organize data by potential anomalies (missing data, type mismatches, extreme values), automatically suggest related columns (what variables might best explain observed outliers or missing values?), and supports many of the same rich interactions already in Trifacta such as brushing and linking (select values in one column to see how they relate to other columns). Moreover, Visual Profiling works with data of all shapes and sizes: start on a sample as you build up a series of transformations, then later visualize the entirety of your transformed data.

I encourage you to see Visual Profiling in action for yourself at the Strata + Hadoop World conference in New York City this week by attending my session Designing with Data or visiting Trifacta’s Booth #143.