Start Free

Speed up your data preparation with Designer Cloud powered by Trifacta

Free Sign Up
All Blog Posts

Data Wrangling in Pharma & Life Sciences

September 17, 2015

In this blog, Trifacta’s Director of Data Science and Solution Engineering Tye Rattenbury chats with Brian Ellerman, Sanofi’s Head of Technology Scouting & Innovation. Sanofi is a global healthcare leader with sectors in research, development, and pharmaceutical and therapeutic manufacturing and marketing. 

At Trifacta, we want to revolutionize the way people and organizations work with data. Our first major step towards that end is a product that enables a new class of Big Data analysts – mixing technical and organizational innovation. Of course, understanding how companies and industries as a whole are addressing these kinds of changes helps us to direct our efforts. The industry going through arguably the largest transformation in this era of Big Data is the pharmaceutical industry given the recent advances in technology combined with government healthcare reforms. To dive deeper into the analytics challenges and opportunities that face pharmaceutical organizations, we sat down with Brian Ellerman of Sanofi to get an insider’s perspective. Included below is an excerpt of that conversation:

Q: Brian, thanks for taking the time to chat with us today. To get started, why don’t you describe your role as Head of Technology Scouting & Innovation at Sanofi and how you influence data and analytics strategy?

Brian: My pleasure. The role is really two-fold: identify innovative technology trends – and the companies leading them – that best connect to business problems and opportunities, and utilize human networks and change management to execute proofs of concept to evaluate fit of the technologies to the problems. Because of this, I’m part of the group leading the strategy and technology evaluation at Sanofi. This involves looking across the many areas of our business and identifying data, technology, skills, and partners common to each, so that we avoid reinventing the wheel as much as possible and realize the true potential of big data analytics.

How has the proliferation of data types from different segments of the bioPharma value chain (e.g. genomic, clinical, etc.) impacted data usability and ultimately, data analysis in the industry?

It’s really one of the great challenges of our current age. On the one hand, there’s tremendous opportunity, for the first time ever, to connect an individual’s health with phenomics – physical traits, activity, and behavior – and genomics – the complete set of cellular DNA, including all genes, and the disease pathways to which they are linked. On the other hand, however, each of these sets of data come from different, often unrelated sources, with varying degrees of data standardization, required expertise, and provenance. So the analysis is hamstrung by how much time must be spent verifying and cleaning that data, and even then just within its fairly narrow scope, not this larger view I mentioned.

Given the billion dollar implications of the drug delivery pipeline, how do you see data & analytics play a role in strategic decision-making?

Data and analytics are the lifeblood. Consider the challenge I mentioned previously: today the biopharma industry runs clinical trials in which groups of people are randomly assigned to one of two groups. Imagine if instead you could have a near infinite number of ‘groups’ because you’re able to individually evaluate the new drug based on phenomics and genomics. Instead of halting an entire trial (at a cost of many years and nearly a billion dollars) a company could instead ‘mass customize’ the trial as it proceeded, moving far more rapidly and inexpensively.

How does technology play a role in your strategy? What are some of the organizational challenges that must be overcome?

It should be clear from my earlier answers that technology plays a pivotal role, from the biomedical technologies that uncover new disease mechanisms and treatment paths, to the informational technologies that manage, analyze, and visualize the deluge of data. The challenge here is that historically these two technological classes have been managed by different groups of people within an organization, with one unaware of its impact on the other.

How can Data Wrangling change the way data flows or is leveraged across your organization to impact critical analysis?

Finally getting all of our data into a cohesive and useful form means those two groups finally will have insight to share with one another, meaning better, data-driven decisions and the evolution of a converged information strategy, my ultimate aim.

Do you have any advice to biopharma companies that are trying to define their big data strategy?

For one, don’t assume yours is the only company fighting its way through this. I’ve met a lot of people from a lot of companies who are facing this precise problem. For another, don’t assume the solution is evenly distributed. One of the reasons we’ve partnered closely with Trifacta is they’ve proven time and again that they have the internal expertise to work with us on our data wrangling, and if it’s in a new domain, they’re more than willing to learn.