Wrangling Data

  • ESSENTIALS: Wrangling Data

    Overview At Trifacta, we emphasize the importance of Human-Computer interaction. As such, everything in the Transformer Page allows for human input and interaction. From most accessible to most flexible, we'll outline the three core access points to wrangling your data in Trifacta Predictive Interaction Based on your interaction with the data, Trifacta's machine learning algorithms will…

  • DEFINITION: Transformer Page

    On the Transformer page, you can identify the data that you need to transform and build your transformation recipes. When you make changes to your recipe, those changes are immediately applied to the sample data shown in the data grid. In real-time, you can preview the results of your transforms and quickly iterate through the data wrangling process. The Transformer page contains the following ele…

  • DEFINITION: Preview Cards

    Preview Cards display suggestions for transform steps that you can apply to the data in the Transformer Grid. The suggestions vary depending on the data you have selected. You can click on each card to preview the effect that the displayed transform step will have on your data. Preview Cards are displayed at the bottom of the Transformer page.Tip: As you mouse over areas of the Transformer page …

  • FAQ: What are the 'Add to Recipe' and 'Modify' buttons?

    In Release 4.0.1 and earlier, the Add to Recipe and Modify buttons appear on the panel at the bottom of the screen. To add a transform to your recipe, click a suggestion card. Then, click Add to Recipe. If needed, you can customize the selected suggestion. Click the suggestion card and then click Modify. This will bring you into Transform Builder, where you can complete your edits. In Releas…

  • HOW TO: View the Wrangle Recipe for a Dataset

    In the Transformer page, click the Open Recipe button on the right side of the data grid: The recipe, which is the set of sequential transformation steps to apply to your dataset, is displayed in the panel: To make changes, click the recipe step to edit. Select from the available options. keywords: wrangle recipe, transformer…

  • HOW TO: Aggregated Table vs. Flat Aggregate

    Overview Aggregates can provide useful calculations and data summarizations of one or many variables. If you do not need to maintain the structure nor all of the details of your dataset, an Aggregated Table can provide a condensed, summarized view of your data. If you do need to maintain the structure and details of your dataset, a Flat Aggregate can provide calculations in a new column without re…

  • FAQ: What do the column charts on the Transfomer page show?

    Overview The bar chart at the top of each column characterizes the data in that column. The bar charts are calculated from the sample rows currently shown in transformer. The bar chart for each column is customized to teh data type and content of that column: For numerical types (integers, floats), each bar represents the number of values that fall into a range of values and are sorted in ascend…

  • HOW TO: Download Your Recipe

    To download your recipe: Open the Recipe Panel Click on More Options (three dots) Select Download Recipe as Wrangle To copy and paste Recipe steps from your .wrangle file to Trifacta: Open your .wrangle file in a text editor Copy the step starting with the transform verb Paste into the Choose a Transformation box …

  • HOW TO: Collect a New Sample

    Sampling is only available in Wrangler Enterprise.New random samples can be generated to get more accurately representative distributions of a your data. However, if generating a random sample for a certain version of the dataset's script would produce a sample that is equivalent to an existing random sample (e.g. no transforms have been done that alter the makeup of the data’s rows), then …

  • FAQ: What is a Dictionary?

    A dictionary is a structured data set which consists of one or more columns. Dictionaries represent master data, and may be used to map values in one data set to a canonical (preferred/uniform) value.See also:FAQ: What is a Custom Type?VIDEO: Using Dictionaries to Validate Data Typeskeywords: dictionary, custom type, type validation, master data, map…

  • FAQ: What is Epoch/Unix Time?

    Unix time (a.k.a. POSIX time or Epoch time) is a system for describing instants in time, defined as the number of milliseconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970, not counting leap seconds.*Unix or epoch time is often used in computer logs. Trifacta offers the ability to convert dates to and from this format. See unixtimeformat and dateforma…

  • FAQ: What is a Left Outer Join?

    OverviewA left outer join (or left join) does not require that there be matching records for each value in the key value of the source (left) table. Each row in the left table appears in the results, regardless of whether there are matches in the right table.ExampleA user wants to perform a left outer join on the following two tables using DepartmentID as a key:Employee table:NameDepartmentIDRoleD…

  • FAQ: What is a Right Outer Join?

    OverviewA right outer join (or right join) is the reverse of a left join. A right join does not require that there be matching records for each value in the key value of the secondary (right) table. Each row in the right table appears in the results, regardless of whether there are matches in the left table.ExampleA user wants to perform a right outer join on the following two tables using Departm…

  • FAQ: What is an Inner Join?

    OverviewAn inner join requires that key values exist in both tables for the records to appear in the results table. Records appear in the merge only if there are matches in both tables for the key values. An inner join can be used to eliminate rows with null values in their key fields.If you wish to include rows containing non-matching values, you must use some form of an outer join. ExampleA use…

  • FAQ: What is a Full Outer Join?

    OverviewA full outer join combines the effects of a left join and a right join. If there is a match between the key values, a row is written in the result.If there is no match for a key value that appears in either table, a single record is written to the result, with NULL values inserted for the fields from the other table. ExampleA user wants to perform a full outer join on the following two tab…

  • HOW TO: Identify Specific Rows

    SOURCEROWNUMBER The sourcerownumber() function contains the row number in the original, source dataset. Using the sourcerownumber() function allows you to make transformations to specific rows in a dataset. You can use this function to promote any row to the header and delete rows or a range of rows. You can use the sourcerownumber() function in conditional statements, and can create a column inde…

  • FAQ: What is the difference between the “split” and “extract” transforms?

    The “split” transform and the “extract” transform treat the original column and the matched pattern differently.Handling of the original column: When you apply a “split” transform to your dataset, Trifacta drops the original column. When you apply an “extract” transform to your dataset, Trifacta preserves the original column. Handling of the matched pattern: The matched pattern is …

  • HOW TO: Undo and Redo Transforms

    Overview We all make mistakes at some point or another. Often we would like to undo our mistakes. All steps ever added to or deleted from your recipe are saved in Trifacta, so you can undo back to any previous point in time. Simply clicking on the undo and redo steps in the recipe panel will correct any mistaken steps you add. …

  • FAQ: What is a Join?

    A join clause combines records from two or more tables in a database. It creates a set that can be saved as a table or used as it is. A join is a means for combining fields from two tables (or more) by using values common to each.* Trifacta supports four types of joins: Inner  Left Outer         …

  • HOW TO: Create a Boolean Column

    Overview A boolean column is used to identify whether or not a value is true or false, 1 or 0, yes or no, or any other combination of true or false. In Trifacta, there are various ways you can create boolean columns. Check for Value Create a new column that checks to see if a value is present, and if so, return true. To do so: Click on the bar chart corresponding the value you want to check for…

  • FAQ: Does Trifacta support Java UDFs?

    A Java UDF is a User Defined Function written in the Java programming language. Trifacta supports uploading custom Java UDFs to be integrated into your Trifacta workflows.…

  • HOW TO: Apply Transforms to Multiple Columns

    Several Trifacta Transformation verbs can be applied to multiple columns. These functions perform transformations on two or more columns instead of on a single column. An example of Transformation using multiple columns can be seen below: The following Transforms allow the use of multiple columns in one or more parameters: Aggregate Delete Derive Drop Keep Merge Move Set …

  • HOW TO: Concatenate a Column and a String

    Overview Some times it is necessary to merge information to a column in your dataset. to do this in Trifacta: Use the Set Transform to merge a string with an existing column or the Derive Transform to create a new column that merges a string with info from a column in your dataset In the Formula box, use the merge function The syntax should follow this format MERGE(["string", column]…

  • HOW TO: Sort Records in a Dataset

    Overview Sorting data can provide contextual information about a column, and can also be used with row based operations. You can sort by one column or multiple columns for more granular detail. Sort by One Column To sort by a column: Click on the Column Menu for the column you'd like to sort by Hover over Edit Column Then hover over Sort Lastly, click either Ascending…

  • HOW TO: Filter a Dataset

    You can filter a dataset in Trifacta based on the values contained in a column.1. In the Transformer Grid, brush over an example of the value that you want Trifacta to use as a filter. In the following image, we have selected the text that says “MTC”: 2. Trifacta generates a set of cards containing suggested transforms.3. Click the suggestion card that contains the Keep transform: The K…

  • HOW TO: Convert Elements in an Array to Multiple Rows

    Overview Arrays are useful data structures for storing information. It is sometimes useful to convert each element of an array into it's own unique row value. To give an example, take the table below, where column2 is an array: Column1 Column2 x ["a","b"] y ["c","d"] We want to convert the above table to the following: Co…

  • HOW TO: Calculate the Maximum and Minimum Values in a Column

    Overview Knowing the maximum and minimum values for certain variables can be valuable in many analytics proceses. In Trifacta, it's simple to calculate the maxima and minima using both aggregated tables and flat aggregates. In the examples below, we will calculate the maximum using an aggregated table, and the minumum using a flat aggregate. Calculating the Maximum as an Aggregated Table …

  • HOW TO: Calculate the Difference Between Two Dates

    Overview Calculating the difference between two dates can be a useful metric for analysis. To calculate the difference in days between two dates in Trifacta: Click on the column header for the two dates you'd like to calculate the difference between Select the suggestion to Create a new column using the Datedif function Edit the suggestion In the Datedif function, change the third parame…

  • HOW TO: Change the Format of a Date

    Overview Dates often come in different formats, and it is sometimes necessary to reformat dates to meet a desired output. In the following example, we will reformat a date in the structure of MM/dd/yyyy hh:mm:ss.SSS, which reads two digit month/two digit day/four digit year two digit hour:two digit minute: two digit second. three digit milisecond to MMM-dd-yyyy, which reads abbreviated month-two…

  • HOW TO: Create an Object of Key-Value Pairs

    Overview An object stores information in a key-value pair, where the key is the a string and the value is the corresponding value for that record. An example of an object is as follows: {"First Name": "John", "Last Name": "Doe", "Age": 35} There are several ways to create Objects in Trifacta, depending on the data within the column(s) you would lik…

  • HOW TO: Combine Multiple Columns

    Overview Some datasets will have multiple columns with information you would like combined into one column. Trifacta has two ways to combine multiple columns which will be demonstrated below. The first way to combine multiple columns is to use the Merge Transform. To use the Merge Transform: Click on the column headers for the columns you would like to combine Find the suggestion to Merge colu…

  • FAQ: What is data aggregation and when should I aggregate?

    Data aggregation is a fundamental practice in data warehousing and analytics, as it allows consumers of business intelligence products to quickly assess, draw conclusions and make decisions based on large amounts of raw data. The idea of an aggregate is to apply calculations across categories and then use the resulting high-level summary information to present overall statistics. For example, you …

  • HOW TO: Drop Multiple Columns

    Overview In many cases, datasets come with information that will not be useful in downstream analysis. To quickly filter down your dataset, it is often desirable to drop multiple columns at once. To drop multiple columns in Trifacta: Open the Columns view select the columns you'd like to filter using ctl/command + click for multi-column select and shift+click to select a range Select Act…

  • HOW TO: Manage Currency Conversions

    To standardize your dataset, you may be required to convert data from one currency to another. This article explains how to manage currency conversions in your datasets using a lookup to an exchange rate reference table.STEP1: Build Your Exchange Rate Reference Dataset1. Create a source file to use as the basis for your exchange rate reference dataset. You can create the reference file in any form…

  • HOW TO: Unnest an Object

    Overview An object stores information in a key-value pair, where the key is the a string and the value is the corresponding value for that record. An example of an object is as follows {"First Name": "John", "Last Name": "Doe", "Age": 35} When unnesting an Object, the keys become columns, with the values for each row becoming the row values. To…

  • HOW TO: Extract the Domain from a URL

    Overview Extracting the Domain from a URL can be helpful in standardizing information for further analysis. To Extract the Domain from a URL column: Click on the column header of the URL column Select the suggestion to Create a new column using the Domain Function Edit the suggestion and inspect the Transform. This suggestion uses the Derive transform. …

  • HOW TO: Use a Reference Dataset to Standardize Column Values

    Often, the records in a dataset contain invalid or non-standard data. These formatting issues can impact the overall quality of the data analytics processes dependent on the data. This article explains how to use a lookup to a reference dataset to standardize the values in a column.STEP 1: Build your Reference Dataset of Standardized Values1. Create a source file to use as the basis for your refer…

  • HOW TO: Calculate the Standard Deviation of a Column

    Overview The standard deviation is a statistical measurement of the extent to which a variable deviates from the mean and is important in many statistical contexts. You can calculate the standard deviation in Trifacta using an aggregated table or a flat aggregate (See HOW TO: Aggregated Table vs. Flat Aggregate). Aggregate You can calculate the Standard Deviation of a column in an aggregated table…

  • HOW TO: Find the Three Largest Values in a Column

    Calculate Third Largest Value Step 1 to identifying the three largest values in a column is to calculate the third largest value. To do this, you will need to Derive a new column and use the KTHLARGEST() Function. This will create a column containing the third largest value of the column you are evaluating. Create Binary Indicator Column that Identifies Three Largest Values Once you have Derive…

  • HOW TO: Clean Mismatched Values

    If a value does not match the data type of a column, Trifacta will label it as mismatched. In most cases, you will want to clean these values so that they match the rest of the values in the column. We will do this through two different operations: The Data Quality Bar (red represents mismatched values) The Column Menu Example 1: Replacing mismatched values with custom values using the Data Quali…

  • HOW TO: Clean Missing Values

    Often times your dataset will have missing values. In some cases, you may want to replace these values with data from elsewhere in the dataset, or replace them with a custom value. This can be done through: The Data Quality Bar (black represents missing values) The Column Menu Example 1: Replacing missing values with 'N/A' using the Data Quality Bar Example 2: Replacing missing valu…

  • ERROR: Requested Data Not Found in Trifacta Wrangler

    Problem Description When you open a previously created dataset in Trifacta Wrangler, the data will not load and the following error message appears: Please reload page (query execution failed). File not found. Reason: Requested data was not found : <file_path>.   Cause This error occurs when the datasource associated with your dataset has been moved, and is no longer located at the path sh…

  • HOW TO: Union – Custom Column Name Matching

    There are two standard ways to Union datasets together, by column name or by column position. If your datasets do not match well with either method, you can use custom column matching matching to manually combine columns together. 1. Click Add Datasets and Align by Name when adding the dataset to the union. You can also align by position but aligning by name may be easier. 2. In the following ex…

  • HOW TO: Multi Column Selection

    Columns View You can select multiple columns using ctl + click (command on Mac) or shift + click in the columns view: Builder You can select multiple columns in builder using the Columns parameter (depending on the Transform) and Transform dependent parameters: You can also select all columns in your dataset using the wildcard (*) character The following Transforms can be used across multiple c…

  • HOW TO: Chain Recipes

    If you want to break up your recipe so that you can generate an output at the current point in your recipe, but add additional steps to the dataset for separate output, you can chain recipes together. This is often important in cases where you are restructuring the dataset using an aggregate, pivot, or similar transform. To acocmplish this: Click on the Wrangled Dataset you would like to chain an…

  • HOW TO: Convert a Date to Unix Time

    Overview Converting a date to Unixtime can be helpful for calculations that use the number of seconds passed since a certain point in time. Unixtime measures the number of seconds that has passed between January 1st, 1970 at 0 degrees longitude and the current date value. Note: Trifacta measures Unixtime in milliseconds, so dividing by 1000 is necessary to convert to the Unixtime generated to seco…