FAQ: Trifacta Architecture Overview

FAQ: Trifacta Architecture Overview

Trifacta Daemon Services
When you start Trifacta, the Trifacta service launches the following daemon services:
  • Webapp: Application frontend for users. Webapp also contains a JavaScript engine that allows the user to execute small scale jobs on the Trifacta server.
  • Monitor: Checks the status of running jobs. It checks on running Hadoop jobs by polling Zookeeper.
  • Batchserver: Submits batch/pig jobs to the cluster. It converts the wrangle script to Pig and executes the Pig job to write output to HDFS.
  • PostgreSQL: Database that contains metadata about users and jobs.
  • Supervisord: Checks the status of Trifacta components.
  • Diagnostic Server: Contains useful diagnostics and tests for the Trifacta service.ML Service contains machine learning piece of Trifacta.
  • Java and Python UDF Service (UDF): Allows Trifacta to run User Defined Functions on the data sets.
Trifacta-Hadoop Interaction
The user interacts with the Trifacta web-based front-end to wrangle data. A user can upload local files to Trifacta or load a file directly from HDFS using WebHDFS. Once the user has finished writing a wrangle script using Trifacta’s web interface, the user submits either a local or batch Hadoop job to execute the wrangle script. The execution call is sent through the Monitor, which tracks the progress of the jobs. Depending on the job type, the processing is handled by either the built-in JavaScript engine or by MapReduce on Hadoop. The results of the job are written to HDFS.

The following diagram shows the interaction between a user, the Trifacta daemon services, and the elements of a Hadoop installation: