Start Free

Speed up your data preparation with Designer Cloud powered by Trifacta

Free Sign Up
Data Quality Template:

Validate File Data with Schema Drifts

Schema Drift Detection Flow The flow view of this template

splitrows, header, $sourcerownumber, join

This template shows how you can validate your file data against expected schema, or when data has shifted in schema from what was expected. It makes use of Designer Cloud’s ability to import data as is without applying inferred row splitting technique, and comparing it to an expected schema’s headers through a join. The results are then split into two outputs, if the file input matches against the expected schema, then the Output – Valid Header output will contain the input data, otherwise you will find the data of the invalid input in the Output – Invalid Header output.

To customize this template for your use, you will need to create 3 distinct datasets to replace the existing datasets in this flow template.

1) A file that contains the expected schema by having the header metadata in the 1st row of the file. This file can contain some sample data as well. This file needs to be imported into Designer Cloud as an unstructured file (see below).

2) An input file to validate against the expected schema. This file should also have its header metadata in the first row of the file. This file needs to be imported into Designer Cloud twice, once as unstructured and once as a structured file.

3) Replace InvalidHeader-Source-Unstructured.csv with the unstructured dataset from step 2), and replace InvalidHeader-Source-Structured.csv with the structured dataset from step 2). Replace Expected-Target-Unstructured.csv with dataset from step 1).

A note on importing file as unstructured:

When you import a file into Designer Cloud, by default it will automatically try to infer how to split the data into records by automatically applying a splitrows transform. Normally you do not see this step nor are you able to modify it. But you can disable this by unchecking the “Detect structure” option in the import dataset settings page.

New user?

If your data is mostly on Google Cloud Platform, please use Dataprep. Otherwise, choose Designer Cloud.

Use in Designer Cloud Use in Dataprep