Every dataset that matters deserves a second set of eyes on it. But how can you review a dataset if you don’t know what steps have been taken to create it? In the past, analysts might’ve verbally run through how the dataset had been transformed to get sign-off, or at the very least written down their steps. There’s two big problems with these methods: one, they’re extremely time-consuming and two, there’s a big risk of missing a few steps along the way.
Modern analysts should expect the same sophisticated review capabilities inherent to cloud-based products like Google Docs or Microsoft 365 from their data preparation platform. In the same way that Google Docs has “version history,” any data preparation technology should outline exactly what steps were taken and when—in other words, show clear data lineage. And just like Google Docs allows for collaborative editing, an effective data preparation platform needs collaborative cloud wrangling. Because after all, why bother sharing how you completed your work if your colleague isn’t able to help correct it?
In the fourth video of our Data School series, Professor Joe Hellerstein shows us how to effectively collaborate on data preparation (and get his grades turned in on time). Watch below to learn more.
Where else can you find Professor Joe Hellerstein?
Joe Hellerstein is the Chief Strategy Officer and Co-Founder of Trifacta. You can also find him at UC Berkeley as the Jim Gray Chair of Computer Science. He has produced many academic resources for public consumption, including undergraduate course videos on database systems, notes from his graduate course, or research from his team and affiliated labs at UC Berkeley: DSF and RISELab.