Discovering exactly what is in your data and how it might be useful for different analytic explorations is key to quickly identifying the value or potential use of a dataset. This exploration process allows you to gain an understanding for the unique elements of the data such as value distributions and outliers to inform the transformation and analysis process.
What is Data Wrangling?
Successful analysis relies upon accurate, well-structured data that has been formatted for the specific needs of the task at hand. Yet, today’s data is bigger and more complex than ever before. It’s time-consuming and technically challenging to wrangle it into a format for analysis. Data wrangling is the process you must undergo to transition raw data source inputs into prepared outputs to be utilized in analysis and various other business purposes.
What is Trifacta?
At Trifacta, we’re focused on providing software that helps individuals and organizations more efficiently explore, transform and join together diverse data for analysis. Whether you’re trying to improve the efficiency of an existing analysis process or utilize new sources of data for a new initiative, Trifacta’s data wrangling solutions empower you to do more with data of all shapes and sizes.
Visually explore, transform, clean and join together diverse desktop files. Completely free.
Intelligent recommendations for cleaning and formatting data
Best-of-breed hybrid desktop application
Utilize advanced self-service data preparation for teams and departments.
Leverage a shared platform for data preparation
Automate wrangling operations that span diverse sources
Deploy an enterprise data wrangling platform for large-scale analytics initiatives.
Empower analyst teams while maintaining governance
Diverse deployment & processing support to scale to any workload
The Data Wrangling Process in Trifacta
Structuring is needed because data comes in all shapes and sizes. Data lacking human-readable structure is difficult to work with using traditional applications. Even well-structured datasets often lack the proper formatting or appropriate level of aggregation required for the analysis at-hand.
Cleaning involves taking out data that might distort the analysis. A null value, for example, might bring an analytic package to a screeching halt; it may need to be replaced with a zero or an empty string. Particular fields may need to be standardized by replacing the many different ways that a state for example might be written out -- such as CA, Cal and Calif -- with a single standard format.
Enriching allows you to augment the scope of your analysis by incorporating disparate internal or 3rd-party data into your analysis. This step includes executing common preparation tasks such as joins, unions or complex derivations. Purchase transaction data, for example, might benefit from data associated with each customer's profile or historical purchase patterns.
Validating is the activity that surfaces data quality and consistency issues, or verifies that they have been properly addressed by applied transformations. Validations should be conducted along multiple dimensions. At a minimum, assessing whether the values of an attribute/field adhere to syntactic constraints as well as distributional constraints.
Publishing refers to planning for and delivering the output of your data wrangling efforts for downstream project needs (like loading the data in a particular analysis package) or for future project needs (like documenting and archiving transformation logic). Downstream analytic tools have dramatic performance increases when they encounter data structured in a certain fashion.
Who is it for?
How It Works
Trifacta sits between data storage and processing environments and the visualization, statistical or machine learning tools used downstream in the analysis process. Our solution is designed to help data analysts do the work associated with data preparation without having to manually write code or use complex mapping-based systems.
With Trifacta, users are able to Interactively Explore the content of their data and through a process called Predictive Transformation, define a recipe for how the dataset should be transformed. This logic is used to define how the data is processed either on your desktop, server, cloud environment or Hadoop. Prior to executing the transformation, the user defines the desired location and format for the clean, well-structured output dataset used in analysis.
Trifacta has brought an entirely new level of productivity to the way our analyst and IT teams work together to explore diverse data and define analytic requirements.