Understanding an unusual HIV outbreak in Scott County, Indiana called upon the CDC to quickly analyze a complex amount of epidemiological data. Traditionally, the team had used a combination of R and Excel spreadsheets to manually standardize and join together data from multiple laboratories. But this process was extremely time-consuming, often requiring weeks or even months to have the data in an analysis-ready state. The nature of R and Excel meant that data quality issues were difficult to spot, plus these tools didn’t allow researchers the ability to access complex data. Public health surveillance data, for example, first needed to be transformed into an readable format by technical employees, which delayed analysis further.
How Trifacta Solved this Problem
The CDC adopted Trifacta as a more sophisticated replacement for its Excel-based data transformation. With Trifacta, the CDC reduced total time spent transforming data from up to three months down to just three days. Trifacta’s visual data quality profiling allows researchers to spot inconsistencies or errors at first glance, as well as combine and standardize complex data without technical assistance. Best of all, Trifacta allowed the CDC to react to the HIV outbreak in Scott County, Indiana in record time and determine the root cause of the spread by enriching its analysis with new and increasingly complex data.
With Trifacta, the CDC was able to reduce data preparation from 3 months to 3 days.
Trifacta’s visualization surfaces errors and duplicates and provides an interactive summary of the statistics.
Enriching datasets with Trifacta resulted in an analysis assumption being refuted.
CDC works 24/7 to protect America from health, safety and security threats, both foreign and in the U.S. Whether diseases start at home or abroad, are chronic or acute, curable or preventable, human error or deliberate attack, CDC fights disease and supports communities and citizens to do the same.
“Each new dataset brings new challenges—in Europe, for example, commas need to be replaced with decimal points. With Trifacta, we’re able to easily examine and correct any data quality issues that arise.”