Start Free

Speed up your data preparation with Trifacta

Free Sign Up
Wrangle Summit 2021 On Demand

You can still experience the best people, ideas and technology in data engineering, all in one place

Get All-Access Pass
 
All Blog Posts

Bad Dates: A Nightmare for Archaeologists and Data Analysts

August 14, 2015

One of my favorite movie scenes is from Raiders of the Lost Ark, when an evil capuchin monkey tries to end our protagonist by poisoning his fruit snack. As a treasure seeking Big Data Analyst, I know that ‘bad dates’ can have an equally devastating effect on the hunt for jewels as it does on the dark, murky and largely unexplored data of exploratory analytics.

In exploratory analytics, the insights derived from data are potentially game changing, but only as credible as the quality of data they are founded on. Developing this foundation is an arduous, time consuming task requiring data from multiple, disparate sources and a meticulous attention to detail difficult to maintain within today’s dynamic business environment. Many statistics have been exposed and undone by a simple flaw in the source data they are built upon. It invalidates the conclusion drawn, making the result largely meaningless to the business. This cycle of high expectations driven by access to ‘big data’ but followed by a flawed delivery method needs to end. Thus, Data Wrangling with Trifacta was born and is poised to be the solution!

Let’s dive deeper into the problem. A key role of any business analyst is aggregating and assessing raw data to identify and act on trends and patterns.

Assessing activity and behavior using metrics like revenue or margin among customers and products is fundamental to any business. The resulting dashboards allow any stakeholder to easily assess and answer key questions, such as “how do certain products sell across different regions?” to make decisions based on the story the metrics tell. If a Product Manager used historical sales performance with missing or poorly interpreted data to decide to discontinue a certain line, the impact on the business could be catastrophic.

So, why is it so difficult to build a solid data foundation for analytics? One of the most important dimensions in any analysis is time. Just about every important event within the enterprise is stamped when they occur: orderDate, shippedDate, customerSinceDate, lastTimeSawRaidersDate. These metrics are key to providing the ability to assess performance over time, making it possible to produce metrics such as ‘year to date sales’ or ‘month over month profit margins.’

Screen Shot 2015-08-14 at 1.00.48 PMThe impact of improperly formatted date fields can have a significant impact on these analyses. Take a look at the partial list of date formats I have included here. The number of possible ways to represent a date is staggering, and in a programmatic sense it is very difficult to anticipate and properly handle the variety. In many applications, including dashboards and reports, a ‘bad date’ can cause a serious system error. Anyone remember ‘Y2K?’ A world wide panic was narrowly avoided simply because people were worried whether financial systems engineered to handle YY could handle going from 99 to 00. Seems pretty trivial unless your code is supposed to ‘Calculate the time it takes to deliver orders to customers,’ and you decide an order placed on 12/18/1999 and delivered on 1/7/2000 took over 99 years to deliver. Probably not a quarterly business review I want to attend.  

Like Indiana Jones and his search for historical treasure, the Exploratory Analytics Treasure Hunter is constantly battling Date and Time fields in analytics data to ensure accurate and timely results. With Trifacta, the analyst can identify, wrangle and resolve date dimension inconsistencies efficiently and consistently by developing rules. With these errors addressed, the analytics products developed will be more relevant and timely to ultimately improve business value.

Data Wrangling with Trifacta–don’t let ‘bad dates’ impact your archeological data analytics hunt.