Start Free

Speed up your data preparation with Trifacta

Free Sign Up
Wrangle Summit 2021 On Demand

You can still experience the best people, ideas and technology in data engineering, all in one place

Get All-Access Pass
 
All Templates

Validate File Data with Schema Drifts

Schema Drift Detection Flow The flow view of this template

Transformations:
splitrows, header, $sourcerownumber, join

This template shows how you can validate your file data against expected schema, or when data has shifted in schema from what was expected. It makes use of Trifacta’s ability to import data as is without applying inferred row splitting technique, and comparing it to an expected schema’s headers through a join. The results are then split into two outputs, if the file input matches against the expected schema, then the Output – Valid Header output will contain the input data, otherwise you will find the data of the invalid input in the Output – Invalid Header output.

To customize this template for your use, you will need to create 3 distinct datasets to replace the existing datasets in this flow template.

1) A file that contains the expected schema by having the header metadata in the 1st row of the file. This file can contain some sample data as well. This file needs to be imported into Trifacta as an unstructured file (see below).

2) An input file to validate against the expected schema.This file should also have its header metadata in the first row of the file. This file needs to be imported into Trifacta twice, once as unstructured and once as a structured file.

3) Replace InvalidHeader-Source-Unstructured.csv with the unstructured dataset from step 2), and replace InvalidHeader-Source-Structured.csv with the structured dataset from step 2). Replace Expected-Target-Unstructured.csv with dataset from step 1).

A note on importing file as unstructured:

When you import a file into Trifacta, by default it will automatically try to infer how to split the data into records by automatically applying a splitrows transform. Normally you do not see this step nor are you able to modify it. But you can disable this by unchecking the “Detect structure” option in the import dataset settings page.

New to Trifacta?

Sign up below to our free 30-day trial to use this template.

SIGN UP FOR FREE TRIAL

Already have an account?

Download template (Trifacta version) and import it on the Flows page.

Is your data on Google Cloud?

  1. Download template (Dataprep version)
  2. Launch Dataprep on Google Cloud
  3. Import it on Flows page

Learn more about Dataprep

How to Import