Generating Output

  • ESSENTIALS: Generating Output

    Overview Once you have built a recipe, you are ready to apply that recipe to the source data and generate an output. In Wrangler Desktop, select Generate Results and choose the output format you would like to generate: In Wrangler Enterprise, select Run Job, choose an environment to run on, and select a publishing action and location: Validating your results When you have generated your re…

  • FAQ: How does Trifacta security integrate with Hadoop?

    Trifacta can interact with the Hadoop cluster using two authentication modes: System mode and User Impersonation mode.In System mode, a single 'trifacta' system account performs actions in Hadoop. This account is given a Kerberos keytab for the cluster. The Trifacta server uses the keytab to authenticate using Kerberos delegation tokens to perform Hadoop actions, such as access WebHDFS o…

  • HOW TO: Change Output Home Directory

    If you want to change the output directory for Trifacta Wrangler, follow these steps: Click the User Preferences menu from the toolbar Select your name from the drop-down Update the path of your desired Output Home Directory or select Browse. NOTE: We recommend creating a new directory or using these default locations: For OS X: /Users/username/trifacta For Windows: C:\Users\username\tri…

  • HOW TO: Generate Results or Run a Job

    After you have created your Recipe are happy with how they are applied to the sample in the Transformer page, you can apply the steps to your entire dataset. Follow the steps below: Click Run Job or Generate Results (Trifacta Wrangler): In the dialog, select the running environment: Trifacta Server for small files (< 100MB) or Hadoop for larger files (Enterprise Only). For …

  • FAQ: Which type of Job do I launch?

    Overview When using Trifacta, you are building a recipe of transformation steps on top of a sample of your dataset. When you are happy with how your dataset looks, you can execute that recipe and apply the transformation steps onto your dataset as a whole. Trifacta gives you two execution options: Trifacta Server (using Trifacta's in house Photon Compute Engine), and Hadoop (using Apache Sp…

  • FAQ: Troubleshooting a Hadoop Job Failure

    You can use the Trifacta and Hadoop job logs to troubleshoot when your Trifacta job fails to run at scale on Hadoop. The Trifacta logs are located under your root Trifacta installation in the “logs” directory. This directory contains the following logs:webapp.log. This log monitors when interaction with the Trifacta web application. You will be able to see issues related to jobs running locall…

  • DEFINITION: Results Card

    You can use the Results Cards to view all of the result files that you have generated for a given dataset.The following table explains the elements shown on each Results Card: IconDescription View the summary profile for your result file. Open the result file. This file has been saved to your local machine. View the Recipe that was used to generate the results. Shows the percentage of …

  • FAQ: What file permissions does Trifacta set on job result files written to HDFS?

    The file permissions that Trifacta sets for job result files depend on two factors:How user accounts are configured on HDFS.How Trifacta is configured to interface with HDFS.See the following article for details on how Trifacta interfaces with Hadoop security:FAQ: How does Trifacta security integrate with Hadoop?If your Trifacta installation interacts with Hadoop through User Impersonation mode, t…

  • HOW TO: Download Results

    When you have successfully Run your Job/Generated Results , you can download a local file in the form of a CSV or Tableau TDE (Tableau Data Extract) file. Downloading a CSV To download a .csv file you will need to: Click on the more options (three dots) icon Select Export Results Click on CSV (highlighted in blue) Likewise, you could have accessed the Export Results dialog box i…

  • FAQ: What is a Job?

    In the Trifacta Wrangler, a "job" refers to an execution of a Recipe on an entire file. When you create a Recipe in Trifacta, you are getting a real time preview of what the output will look like in the form of a sample of your source data. Running a Job will then apply that Recipe to the entirity of your source data and generate output with the Recipe applied to your source data. The so…

  • HOW TO: Add Publishing Options

    When you Run a Job, you have the option of adding publishing options to your output. These options include: Choosing an output directory in Hive, HDFS, S3, etc. Choosing an output format, including CSV, JSON, Avro, and Parquet Appending to an existing file, replacing an existing file, or adding a new file to the directory Choosing between single or multi file formats Choosing …

  • HOW TO: Download the Current Sample from the Transformer Grid

    Overview If you want to quickly download the sample you are working on in the Transformer Grid, rather than run a job/generate results on the entire dataset, you can do that from the recipe panel. To download the sample in the grid: Open the Recipe Panel Select the more options (three dots) button Choose to download sample as CSV You will be prompted to choose a destination path …