See How Data Engineering Gets Done on Our Do-It-Yourself Data Webcast Series

Start Free

Speed up your data preparation with Trifacta

Free Sign Up
Summer of SQL

A Q&A Series with Joe Hellerstein

See why SQL is Back
 

The Centers for Disease Control

 

Opportunity

By leveraging Trifacta, the Centers for Disease Control (CDC) had the opportunity to improve outbreak detection, better identify transmission patterns and provide community-tailored intervention for faster public awareness.

Challenges

  • Epidemiologic data sources are diverse and require extensive normalization. The CDC receives a variety of diverse data sources in differing formats from laboratories all over the country
  • Manual data preparation was time-consuming. CDC analysts merged data by hand in Excel, which took days (or even weeks), and their inability to visually explore raw data made it difficult to spot potential data quality issues
  • Subject matter experts couldn’t access complex data. Public health surveillance data first needed to be transformed into an readable format by technical employees, which delayed the analysis⁠⁠⁠ conducted by researchers

Solution with Trifacta

  • By leveraging Trifacta, the CDC is now able to accomplish the same work that took three months in three days—and with fewer resources
  • With Trifacta, the CDC can easily examine and correct any data quality issues that arise, such as misspelled native tribe names. Trifacta allows for visual categorization and seamless grouping of attributes
  • Subject matter experts now have increased access and visibility into the raw data, which has allowed the team to fold in more relevant data and better assess the accuracy of their assumptions

Company Background

The Centers for Disease Control (CDC) is the leading national public health institute in the United States.

“We were actually able to shave the amount of time it took to do the analysis by [a factor of] six. Rather than having to do a tremendous amount of analysis, we’re actually readily able to start getting incremental data products out quickly”