FAQ: Creating a Datasource from HDFS

FAQ: Creating a Datasource from HDFS

Which objects in HDFS can I use to create a datasource?
You can create a datasource based on an HDFS file or an HDFS folder.

How does Trifacta create a datasource based on a folder?
When you select a folder as the basis for a datasource, Trifacta creates a single datasource that contains the contents of all of the files in the selected folder. If the selected folder contains subfolders, Trifacta will recursively search those folders and include the contents of any files from the subfolders in the datasource. Trifacta assumes that all of the files in the selected folder have the same structure and file format. Trifacta will ignore any common HDFS metadata files found in the selected folder or subfolders.

Can I create a datasource based on a folder that contains both a JSON file and a CSV file?
No. You can only use folders as a datasource when the files contained in that folder have the same file format. Select either the CSV file or the JSON file as the basis for your datasource.

Can I create a datasource based on Trifacta’s job results?
Yes. Trifacta writes job results to a specified output folder on HDFS. You can use the job results as the basis for a new datasource by navigating to the output folder in the Add datasource pane and selecting the appropriate file. You can also create a new datasource directly from the Job Results page.