Start Wrangling

Speed up your data preparation with Trifacta

Free Sign Up
Free Data Cleaning in the Cloud

Get a free trial of Trifacta on AWS

Free Trial
Trifacta Ranked #1 in Data Preparation Market Study

Dresner Advisory Services study reviews and ranks 24 vendors

Get the Report
Schedule a Demo

Business Analytics

How GSK Built an Analytics Center of Excellence (COE) with Data Wrangling

< Back to Blog
 
April 5, 2018

When Mark Ramsey joined GlaxoSmithKline (GSK) in 2015 as its SVP of the R&D Data office, he saw an opportunity to instill change. He considered the life sciences and pharmaceutical industry a “lagging industry” from an analytics perspective, despite its appetite for data, because he rarely saw companies reuse their data. “There’s lots of data that’s created in the execution of a clinical trial, for example, but if it’s only used for that one clinical trial, you’ve really missed an opportunity to harness the power of the data,” Ramsey says. “That’s where financial services and other industries have been a lot more innovative in bringing this data together, and really making it available as an asset across the organization as opposed to just at its initial point of contact.”

As a 300-year-old business, GlaxoSmithKline (GSK) had amassed petabytes of data in more than 2,100 silos, and largely operated in the siloed fashion that Ramsey described. Researchers and scientists struggled to access historical clinical trial data, which, at best, cost them wasted time in gathering that information, but often also impeded improvements to new clinical trials altogether. Part of that challenge derived from the freedom these researchers and scientists were granted in terms of data creation and capture. The variance in their data standards—not to mention the wide-ranging types of data they collected—created a significant challenge for GSK data scientists when they attempted to conform this data for reuse. It wasn’t efficient, and GSK’s R&D organization was often deprived of the data they needed.

Building an analytics center of excellence (COE), to Ramsey, was the best path forward. A COE model is one that concentrates big data knowledge and best practices for the organization within a small team, but ultimately, entrusts business users with the right tools to drive value themselves. “The center of excellence is the catalyst; it’s showing people the art of the possible,” Ramsey said. “But it’s still extremely important to have those folks who are actually doing the analytic work be very integrated into the business function.” Inherent to creating a COE at GSK was adopting a new set of technologies that would propel this vision forward. The COE team specifically selected interoperable technologies such as data storage and processing platform Cloudera or data ingestion technology StreamSets that would allow them to bring in data at massive scale and speed, rationalize the data, put it into an industry standard ontology. In order to provide access to business users, however, GSK needed data wrangling platform Trifacta.

With Trifacta, business users can visually understand and transform raw data to fit their individual analytic needs without requesting service from the COE team. “It’s like the old adage of teaching them to fish, instead of just giving them a fish,” Ramsey says. “In this particular case, by using Trifacta, it really enables them to do a lot more self-service than they would ever be able to do, otherwise.” Now, Trifacta is deployed across the GSK organization to hundreds of business users, allowing them to tap into data that they never had visibility into before. With the ability to drive new insights, GSK has seen dramatically reduced time spent on clinical trial design, and are nearing closer to their vision of reducing drug development time in half.

Designing a COE is a strategic move for any business but, as Ramsey admits, it’s important to keep focus on the business users. All of the work involved in building a COE from the ground up—recruiting the right team, implementing the right technologies, and all of the work required on the backend in terms of moving or rationalizing data—is in jeopardy if business users don’t have the opportunity to leverage data for real value. “Ultimately, getting the data in the hands of the consumer is critically important, so that they can then actively make the decisions,” Ramsey says, “and Trifacta is a key component of that.”

To learn more about how GSK leverages Trifacta, watch the testimonial video below: