Class is Now in Session

Presenting The Data School, an educational video series for people who work with data

Learn More

The Data School with Professor Joe Hellerstein: Collaborative Data Prep

August 19, 2020

Every dataset that matters deserves a second set of eyes on it. But how can you review a dataset if you don’t know what steps have been taken to create it? In the past, analysts might’ve verbally run through how the dataset had been transformed to get sign-off, or at the very least written down their steps. There’s two big problems with these methods: one, they’re extremely time-consuming and two, there’s a big risk of missing a few steps along the way.

Modern analysts should expect the same sophisticated review capabilities inherent to cloud-based products like Google Docs or Microsoft 365 from their data preparation platform. In the same way that Google Docs has “version history,” any data preparation technology should outline exactly what steps were taken and when—in other words, show clear data lineage. And just like Google Docs allows for collaborative editing, an effective data preparation platform needs collaborative cloud wrangling. Because after all, why bother sharing how you completed your work if your colleague isn’t able to help correct it?

In the fourth video of our Data School series, Professor Joe Hellerstein shows us how to effectively collaborate on data preparation (and get his grades turned in on time). Watch below to learn more. 

 

Where else can you find Professor Joe Hellerstein? 

Joe Hellerstein is the Chief Strategy Officer and Co-Founder of Trifacta. You can also find him at UC Berkeley as the Jim Gray Chair of Computer Science. He has produced many academic resources for public consumption, including undergraduate course videos on database systems, notes from his graduate course, or research from his team and affiliated labs at UC Berkeley: DSF and RISELab

 

Related Posts

Teaching the Principles of Data Cleaning at Northwestern University with Chicago Police Misconduct Data

The following is a guest post from Jennie Rogers, Assistant Professor at Northwestern University. Knowing the... more

  |  May 20, 2020

Wrangling for Tableau

For those who work with data regularly, the problem of “Data Wrangling” can be one of the most... more

  |  August 28, 2014

Data Wrangling and Visualization on a Future-Proof Platform

Trifacta and Hortonworks’ partnership is committed to accelerating the adoption of open source... more

  |  September 12, 2016