Start Free

Speed up your data preparation with Designer Cloud powered by Trifacta

Free Sign Up
All Blog Posts

Teaching Data Prep Concepts in the Classroom

June 1, 2018

Trifacta Wrangler, our free personal edition of Trifacta, is being used at hundreds of universities around the world for teaching data prep and data munging concepts. I’ve spoken with half a dozen instructors at colleges and universities in the US and Europe about how data prep fits into a larger data analytics curriculum for their students.

A common theme amongst university professors is the desire to teach students how to handle real-world data. For the instructors I’ve spoken with, that typically means using public data sources or running website scrapers to gather data for analysis; for both of these sources, the data is messy more often than not.

It’s widely known that data cleansing and data prep are the lion’s share of data analysis; 80% of an analyst’s time is spent simply preparing data for downstream analytics. It’s appropriate to mention, too, that by “data analysts” we mean anyone who has to analyze a dataset.

“ It was a nice, easy way to introduce data cleansing / data wrangling as a concept.”

John Lochner. Visiting Instructor, Hamline University

Data analytics as a concept is taught in business schools, health science center curriculum, journalism, and other departments with student cohorts who are not necessarily planning to graduate to full-time data analyst jobs. Rather, they see data analysis as a means to an end. Whether they head onto careers as engineers, marketers, scientists, public servants, etc., analyzing data is now a core part of many jobs.

Providing students with clean and tidy datasets – no mismatched data types, no missing data, no spelling or formatting errors, and all formatted perfectly with the right column labels – is setting them up for a future of hurt. Real-world data is messy. Fortunately, most professors I’ve spoken with are having students download publicly available datasets for analysis – data that’s rife with issues and also very much akin to what’s found in real academic, business, or government jobs.

“Wrangler has spoiled us!”

Chris Claterbos, Lecturer Business Analytics Program Associate, University of Kansas

Additionally, while students are more technically capable every year, they are not all SQL wizards or python/R coders, and nor do they necessarily want to be. But, they do want to be able to work with data in whatever form it comes. Microsoft Excel excels as a spreadsheet, but struggles as a data prep tool for multiple datasets that need to be blended together or for large datasets which can cause Excel to crash.

Trifacta Wrangler fits the bill because it has the power to blend data, work with larger datasets, includes SQL-like capabilities such as JOIN and UNION but without any coding, export to CSV and other file types, and allows students to visually wrangle data and step backwards and forwards in their recipe steps, or edit steps to refine their work.

And, of course, it’s free!

Despite Wrangler’s successes in the classroom, there’s always room for improvement. If you are teaching data prep in your classroom, whether you’re using Wrangler or not, we would love to hear from you: what are your challenges and needs for teaching data prep?

Lastly, if you’re not giving students an option to try Wrangler for the data prep portion of your course, we invite you to consider trying it yourself. Every single professor I spoke with said that students picked it up quickly through a short demo or our online tutorials. While not everyone required Wrangler — few, in fact, did — most found it was a great tool for students who were learning the basic concepts.