Health Scientist, CDC
Manager for Customer Success, Trifacta
Chief Scientist, Leidos
In some areas of the United States, the spread of HIV/AIDS is gaining new traction from the opioid epidemic. However, identifying and suppressing these outbreaks is difficult—transmission between persons with acute HIV infection is extraordinarily difficult to detect via routine investigations. The Microbial Transmission Network Team (MTNT), a sector of the Centers for Disease Control (CDC), sought out to leverage new techniques in data collection, preparation and visualization to enhance their understanding of transmission dynamics and, consequently, tailor prevention and intervention efforts according to the needs of a population at risk. One of the analytic tools the MTNT team used to investigate the HIV outbreak was Collaborative Advanced Analytics & Data Sharing (CAADS), managed by Leidos, which incorporates self-service data preparation technology from Trifacta, along with Alpine Data, Tableau, Arcadia Data, and Centrifuge Systems .
In this session, Ells Campbell, a computational biologist, will outline the CDC’s approach to the data inference, characterization, and visualization processes that are being pioneered by MTNT. In addition, Ryan Weil, a chief scientist at Leidos, will explain how the CDC has leveraged the CAADS platform including Trifacta to execute their analysis. Finally, Connor Carreras, a customer success manager at Trifacta, will demo how MTNT leverages Trifacta’s data wrangling solution to perform deeper exploration and more efficient transformation of complex epidemiologic data.
Founder and Chief Technical Officer
Senior Software Engineer
Organizations deploying Hadoop are storing, organizing, processing, and analyzing more data than ever before, and the number of analytic applications natively integrating with Hadoop has grown rapidly in the last few years. Consequently, there are often hundreds or thousands of business and data analysts that leverage Hadoop clusters to explore, wrangle, visualize, and operationalize data for diverse use cases. As cluster utilization increases, however, maintaining performance of both exploratory and production use cases becomes critical.
Sean Kandel shares best practices for building and deploying Hadoop applications to support large-scale data exploration and analysis across an organization and demonstrates techniques to amortize exploratory workloads across clients to scale deployments while limiting performance degradation. Along the way, Sean explains how to flexibly compile queries across multiple runtime engines to optimize both data analytic and transformation queries and compares benchmarks for multiple architectures, demonstrating the effects of these techniques in data lake initiatives.