What is Data Lineage?
Data Lineage is defined as a data lifecycle that includes the data’s origins and where it moves over time. The ability to track, manage and view data lineage helps simplify tracking errors back to the data source and it helps debugging the data flow process. By tracking and utilizing the data lineage information within the analytics process, organizations are better able to shorten the decision making process, enhance data loss prevention and enable more efficient and cost-effective compliance and auditing.
Data has increasingly become a critical component of just about every aspect of business. It’s widely reported that approximately 90% of the world’s data had been created in the last 2 years. This explosion in the volume of data is a result of the number of systems and automation processes at all levels and sizes of organizations. Understanding all aspects of where the data came from, how it was used, how it was modified and where it goes are all equally important. Data lineage also lets you know what happens to the data as it is used in any number of business processes. Utilizing the history of your data in your analysis provides visibility into those processes while greatly simplifying the ability to trace errors back to the root cause. Data lineage is a critical part of your business intelligence solution.
“Requirements in the era of digital transformation are expanding the definition of lineage to not only include where and how but also answer the rest of the five Ws of data,” says Stewart Bond, director, Data Integration Software, IDC. “Data lineage is a core element in emerging data intelligence solutions, bringing more insight about the data itself and delivering even more impact and value for data-driven organizations.”
Data Lineage with Trifacta
Trifacta Wrangler’s column lineage feature provides a view into data lineage to support an organization’s data analysis initiatives. The column lineage feature also provide the necessary visibility to support several business critical compliance and regulatory requirements. This feature gives the user the ability to see how a column or set of columns were created, in addition to the ability of looking forward at the downstream dependencies. To learn more about Trifacta’s data wrangling solution, download the white paper Trifacta Data Wrangling for Hadoop: Accelerating Business Adoption While Ensuring Security & Governance.