When data is of paramount importance in these trying times, we need to ask ourselves a simple question: do we really want to spend 80% of our time cleaning the data before being able to use it to uncover the truth? This is where Trifacta comes in; we can make your journey to finding truth easier and faster.
I found this COVID-19 World Vaccination Progress on Kaggle, and I thought it would be the perfect dataset to try out a few of the features.
The dataset provides a list of countries, the various brands of vaccines, and data around the number of vaccinated people.
Before analyzing this dataset, you need to first clean the data. I took the example of the vaccines column to perform some of the basic steps. I don’t do this for a living, but with just a little bit of exploration, this is what I could achieve:
It really is as simple as it looks. I didn’t edit the video too much to show you the approximate time this activity may take. Are you tasked with standardizing, cleaning, transforming, or blending data as part of the dirty work of data engineering?
Join Trifacta on April 7-9 for the Wrangle Summit 2021, the first event focused exclusively on data engineering, or what we consider the most interesting, messy, and opportunity-rich area of the data lifecycle that comes before analysis.
This virtual event promises to be like no other:
- Play around with the latest tools and techniques for getting data ready for analytics and machine learning
- Explore training and certifications programs to teach you and your team new skills
- Network with peers, industry leaders, and the Trifacta team (yes, even at a virtual event)
And for those particularly interested in learning more about the intersection between data preparation and healthcare, you might want to catch our Wrangle Summit session “Why Data is the Ultimate Weapon in Fighting Infectious Diseases” featuring Computational Biologist Ells Campbell from the Center for Disease Control and Prevention, Head Data Engineer Kalynn Kennon from the University of Oxford’s Infectious Diseases Data Observatory, and Andrew Coe from Genomics England.
Get a sneak peek into our sessions and register for Wrangle Summit, by clicking here.