See How Data Engineering Gets Done on Our Do-It-Yourself Data Webcast Series

Start Free

Speed up your data preparation with Trifacta

Free Sign Up
Summer of SQL

A Q&A Series with Joe Hellerstein

See why SQL is Back
All Templates

Validate File Data with Schema Drifts

Schema Drift Detection Flow The flow view of this template

splitrows, header, $sourcerownumber, join

This template shows how you can validate your file data against expected schema, or when data has shifted in schema from what was expected. It makes use of Trifacta’s ability to import data as is without applying inferred row splitting technique, and comparing it to an expected schema’s headers through a join. The results are then split into two outputs, if the file input matches against the expected schema, then the Output – Valid Header output will contain the input data, otherwise you will find the data of the invalid input in the Output – Invalid Header output.

To customize this template for your use, you will need to create 3 distinct datasets to replace the existing datasets in this flow template.

1) A file that contains the expected schema by having the header metadata in the 1st row of the file. This file can contain some sample data as well. This file needs to be imported into Trifacta as an unstructured file (see below).

2) An input file to validate against the expected schema.This file should also have its header metadata in the first row of the file. This file needs to be imported into Trifacta twice, once as unstructured and once as a structured file.

3) Replace InvalidHeader-Source-Unstructured.csv with the unstructured dataset from step 2), and replace InvalidHeader-Source-Structured.csv with the structured dataset from step 2). Replace Expected-Target-Unstructured.csv with dataset from step 1).

A note on importing file as unstructured:

When you import a file into Trifacta, by default it will automatically try to infer how to split the data into records by automatically applying a splitrows transform. Normally you do not see this step nor are you able to modify it. But you can disable this by unchecking the “Detect structure” option in the import dataset settings page.


New user?

Use the buttons above and start your 30-day free trial. If your data is mostly on Google Cloud Platform, please use Dataprep. Otherwise, choose Trifacta.

Learn more about Dataprep