Join us on April 7-9, 2021

The first industry event focused on data engineering

Register Today
All Blog Posts

The Cost of Expired Data (And What to Do About It)

June 17, 2016

Your business gathers data from multiple sources en masse to derive insights. All of this information is valuable—retailer data, stock reports, security reports—but for certain initiatives, this data also carries an expiration date. That doesn’t mean you’ll never use it again. But when determining which product to put on shelves or what stock to invest in, eventually, there will be more timely, relevant data. And the cost of not leveraging that data can “spoil” your business from inaccurate insights.

The challenge today is that the average expiration date for data is shrinking. As data becomes more readily and more frequently available, businesses must reconcile with a diminishing time gap between relevant and irrelevant data. They need the ability to ingest data, prepare data, analyze data, and act upon it—all much more efficiently than ever before.

The Effects of Expired Data

But first, what exactly are the potential hazards of using “expired” data? Data is data is data, right? For PepsiCo, a Trifacta customer, not quite. Their CPFR (Collaborative, Planning, Forecasting, Replenishment) team supplies the nation’s largest retailers with PepsiCo product delivered in exactly the right amounts, at the right time. Should they forecast incorrectly and supply too much of the wrong product, that’s wasted product on their customers’ shelves. Worse yet, if they they undersupply, PepsiCo loses revenue and must appease to unhappy customers. With razor-thin profit margins, PepsiCo depends on a constant stream of data. Each new batch of data they receive from their retailers about sales trends supersedes the last.

On the flip side, moving quickly to respond to new data has huge implications for PepsiCo’s competitive advantage. Analyzing retailer data faster than their competition allows PepsiCo to supply their retailers first, ensuring that they have the highest rate of customer satisfaction and maximize their inventory potential.

Close the Gap Between Raw Data & Analysis

For PepsiCo, the key to being able to move fast and deliver analysis quickly was adopting a next-gen data preparation tool, Trifacta. Before, the CPFR team’s data had essentially scaled past Excel and Access capabilities, not to mention the team’s efforts were sioled and often resulted in inconsistencies or oversights. With Excel, the team worked around the clock to piece together the right analysis—sometimes taking up to 6 months.

Trifacta has allowed PepsiCo to automate the bulk of the process, while also giving them the ability to examine huge datasets all at once, instead of splitting them up into various spreadsheets. To their knowledge, no one else in the industry is doing this. Their analytic build time has been reduced by as much as 90%, and they’re more easily able to spot mistakes, such as duplicates or missing orders.

Accelerating Data Growth

Big data is growing exponentially, and the demand for analyzing it will only increase. As we tap into more and more streams of information—call logs, IoT data, social media—and that data’s expiration date diminishes, analytics teams will face increasing pressure to deliver more complex insights, faster. Just a year ago, Business Insider reported that with all of the data available at our fingertips, we only analyze 0.5% of it—and that number is shrinking as more data is collected. We’re reaching a point of data saturation, in part because we can’t analyze it fast enough before it expires.

Of course, this phenomenon affects businesses of all shapes and sizes, across all industries, but it’s especially impactful when time is of the essence. Analyzing fraud data before your hackers do. Understanding the adverse effects of your drugs before they go to market. Recognizing (and addressing) customer dissatisfaction before it compounds upon itself. All of these events depend upon data—but not that, they depend upon that data being analyzed quickly.

The Bottom Line

Racing against the clock is never fun—your work is rushed, susceptible to mistakes, and you hardly have time to step back and look at the bigger picture to address the larger problems. For PepsiCo, getting out of this endless race meant adopting a new tool that empowered them to wrangle more data, faster, without learning new skills or outsourcing their work. Now, their analyses are derived from data that always fresh, never spoiled.

To learn more about Trifacta’s data wrangling software, try out our free cloud product, Trifacta Wrangler!