Start Wrangling

Speed up your data preparation with Trifacta

Free Sign Up
Trifacta Ranked #1 in Data Preparation Market Study

Dresner Advisory Services study reviews and ranks 24 vendors

Get the Report
Schedule a Demo

Big Data Comes to Life (Sciences)

March 15, 2014

Life science research breakthroughs are driven by access to data.  So it is no surprise that open access to data has often been a discussion in the scientific community.  For many years PLoS, a publisher of seven scientific journals including the largest scientific journal PLoS One, has encouraged the sharing of data among researchers.  But recently they changed their policies to require open access

Starting this month all authors who submit to PLoS, will have to guarantee that the data used in the paper is publicly accessible and sign a data availability statement. This move highlights the fact that healthcare and the life sciences are among the most interesting emerging areas where Big Data approaches are coming into play.

New Big Data Approaches to Analysis

What do we mean by Big Data approaches?  We’re seeing organizations that are much more focused on “gather all the data, and figure it out later”.  Contrast this with what we used to see typically in biology and healthcare organizations where the single source of truth was the goal with rigid data standards and heavily curated data repositories.

Today’s shift in approach is facilitated by a return to the philosophy of “schema on use” that is encouraged by technologies like Hadoop, and approaches like MAD Skills.  The PLoS announcement is a great example of this — ensuring that the life science community captures the data, without getting doctrinaire about how it should be structured or coded.  Ensuring open access to data in all its raw forms promotes the type of exploration, collaboration, and quick time to insight that differentiates a Big Data approach from traditional data management and analytics.

Steps Beyond Data Access

This commitment to capture data is step one in a journey toward successful modern usage of data.  The next step is to actually make schema on use the norm — by enabling the efficient and malleable transformation of the captured data for a variety of business use cases.  With schema on use, a single group of medical records can support everything from evidence-based medicine to predictive analytics that reduce hospital readmissions, or even strategic planning for a healthcare organization.

The Big Data ecosystem is waking up this need for tooling to assist with “schema on use” analytics, and we see it first-hand in the Healthcare domain with some of our customers, where the needs are urgent.  It will be interesting to see how the scientific community follows suit.

Related Posts

How Trifacta Has Driven Innovation in Financial Services From Its Beginning

“If data analysis ever hopes to scale at the rate of technologies for storing and processing data, the... more

  |  July 18, 2016

The Data Lake Ecosystem: How to Govern a Growing Data Lake

In this four-part series, we’ll explore the data lake ecosystem—its various components, supporting... more

  |  May 3, 2016

Optimize SAS Investments with Data Wrangling

The Rise of Data Wrangling Of the many steps required to accurately analyze big data, none is more... more

  |  August 14, 2017