Blog Subscription Form

 

Data Munging

From data munging to data wrangling.

The process of manual data cleansing prior to analysis is known as data munging. This process can be a laborious task without the right tools. The common interface used for data munging is often Excel, which lacks the sophistication for collaboration and automation to make the process efficient. 80% of the time spent on data analytics is allocated to data munging, where IT manually cleans the data to pass over to business users who perform analytics. Data munging is time consuming and disjointed process gets in the way of extracting true value and potential from data.

Instead of using data munging techniques to analyze your data, you should be wrangling data with Trifacta. We’ve developed a 6 step guide to data wrangling using Trifacta Wrangler’s features, outlined below:

  1. Discovering: Learn what’s in your raw dataset to think ahead about the best approach for your analytic explorations. This allows you to understand unique elements of the data such as outliers and value distribution to inform the analysis process.
  2. Structuring: This is a critical step because your data comes in all shapes and sizes, and it is up to you to decide the best format to visualize and explore it. Separating, blending, and un-nesting are all important actions in this step.
  3. Cleaning: This step is essential to standardizing your data to ensure that all inconsistencies (such as null and misspelled values) are addressed. Other data may need to be standardized to a single format such as state abbreviations.
  4. Enriching: At this point, you’ve gotten a clear handle on your data – what else could you add to provide more value to your analysis? Enrichment is often about joins and complex derivations. For example, if you’re looking at biking data, perhaps a weather dataset would be an important factor in your analysis.
  5. Validating: Verify if you’ve caught all of the data quality and consistency issues and go back to address anything you may have missed. Validation should be done on multiple dimensions.
  6. Publishing: This is where you can download and deliver the results of your wrangling effort to downstream analytics tools. Once you’ve published your data it’s time to move onto the next step: analytics!

For a detailed guide with real data displaying how each step is done using Trifacta, and get beyond the laborious task of data munging, download our eBook Six Core Data Wrangling Activities: An Introductory Guide to Data Wrangling with Trifacta.

Download eBook