So you’ve decided to transition (at least in part) your data analytics to the cloud. More specifically, you are adopting Google Cloud Platform (GCP), one of the “big three” cloud providers. Now what? Here at Trifacta, we’ve worked with hundreds of customers in this position and have learned a thing or two about how to get up and running with self-service analytics on GCP as successfully as possible. The following five-part blog series is by no means a definitive list, but from our perspective, these tips should be top of mind for optimizing self-service analytics on GCP.
We’ve covered a lot in our five-part series around optimizing self-service analytics on the Google Cloud Platform, including the best way to move to the cloud, the importance of self-service data preparation, the partnership needed between business groups and IT and, finally, the roles and responsibilities of key stakeholders. That last blog title is a bit of a misnomer. While there are certainly key roles and responsibilities for self-service data preparation, we believe that organizations should adopt a mentality that considers the larger umbrella term—data quality—everybody’s responsibility.
Lesson 5: Data Quality is Everyone’s Responsibility
Thinking back to post two where we discussed ETL, we described the difficulty of a limited IT organization having to supply and prepare data for the entire organization. The same can be said for maintaining data quality. IT teams were once encumbered with maintaining data quality throughout the entirety of the organization, from ingestion through delivering requirements to the business. However, shifting the responsibility of data quality toward business users is a more efficient approach. For one, instead of a small task force chasing down issues of data quality, there would be more eyes on the data that can track down errors. But it also leads to better curation for the end analysis since business users have the best context of the data
In order to make the above scenario a reality, organizations have adjusted their mindset from thinking that, like data preparation, the role of maintaining data quality should be relegated to a small group to now understanding that everyone can play a part. Of course, this doesn’t mean that business users will suddenly be involved in the heavy lifting of data ingestion or data cataloging. IT will still curate the best data and make sure it is sanctioned and re-used (this ensures a single version of truth and increases efficiency). But with business users involved in the finishing steps in cleansing and data preparation, these users can ultimately decide what’s acceptable, what needs refining, and when to move on to analysis.
Cloud Dataprep by Trifacta offers business users a service to play their part in data quality. Cloud Dataprep by Trifacta improves the accuracy, consistency, and completeness of data by applying ML and AI to automate data cleansing procedures. Automation handles the scale of very large data repositories and quickly identifies data values that appear to be incorrect, invalid, missing, or mismatched. Outliers that merit closer inspection are automatically flagged.
Automated data profiling and cleansing routines spot inconsistencies across sources being integrated into a cloud data storage, highlight probable data duplication, and recommend how to correct data quality problems visually through code-free, automated transformations.
As new structured, unstructured, or semistructured data is ingested and integrated into the cloud, data quality is continuously validated. “Continuous validation” means that users don’t have to wait until the end of the validation process to view and test the results, a delay that’s incompatible with today’s agile development methodologies.
The Next Step Toward Self-Service
As our series comes to a close, one of the biggest takeaways we discussed with regard to self-service analytics on Google Cloud Platform is that self-service data preparation is critical. Without clean data you can’t have trusted analysis. And without self-service data preparation, you won’t have effective self-service analytics. On Google Cloud Platform, self-service data preparation translates to Cloud Dataprep by Trifacta, a user-friendly service powered by machine learning. To learn more about Cloud Dataprep and how you can get started for free, click here.
Finally, to see all of our tips for successful self-service analytics on GCP in one place, you can download our eBook, “Self-Service Analytics on the Google Cloud Platform: Five Data Preparation Lessons Learned to Ensure Success.”