HOW TO: Split Data into Multiple Columns

HOW TO: Split Data into Multiple Columns

Overview
The Split transform creates new columns by splitting an existing column around a specified value or pattern. The new columns contain data located to the left and right of the pattern used to create the split. The original column is dropped.



Usage

The following table explains how to use parameters with the Split transform:
Parameter Modifier Usage
Column Argument: Column in your dataset. Required. Use this parameter to specify the column in your dataset that contains the information you want split out. You can only apply the split transform to a single column.
on Argument: Pattern written using any of the allowable text-matching clauses. Optional. Use this parameter to specify the pattern that you want to split on. You can define the pattern using string literals, regular expressions, or Trifacta selection rules. Trifacta will only split on strings that exactly match the pattern specified by this parameter. Only available when toggled to "On pattern"
starting after Argument: Pattern written using any of the allowable text-matching clauses. Optional. Use this parameter to specify the pattern that precedes the string that you want to split on. You can define the pattern using string literals, regular expressions, or Trifacta selection rules. Trifacta will only split strings that occur after the pattern specified by this parameter. Accessible in both "On pattern", "Between two patterns". 

ending before Argument: Pattern written using any of the allowable text-matching clauses. Optional. Use this parameter to specify the pattern that follows the string that you want to split on. You can define the pattern using string literals, regular expressions, or Trifacta selection rules. Trifacta will only split strings that occur before the pattern specified by this parameter. Accessible in both "On pattern" and "Between two patterns"
starting from Argument: Pattern written using any of the allowable text-matching clauses. Optional. Use this parameter to specify the pattern that starts the string that you want to split on. You can define the pattern using string literals, regular expressions, or Trifacta selection rules. Trifacta will only split on strings that occur after the pattern specified by this parameter. Available in "Between two patterns" and "Between two positions".
ending at Argument: Pattern written using any of the allowable text-matching clauses. Optional. Use this parameter to specify the pattern that ends the string that you want to split on. You can define the pattern using string literals, regular expressions, or Trifacta selection rules. Trifacta will only split on strings that occur before the pattern specified by this parameter. Available in "Between two patterns" and "Between two positions".
Delimiter array Array of strings that list the explicit field delimiters in the order to apply them to the column.
Positions array Array of integers that identify the zero-based character index values where to split the column.
Every integer String literal or pattern that appears after the pattern to match
Number of matches Number Optional. Use this parameter to define the maximum number of times you want the pattern split on. Each split on will create a new column.
Ignore matches between Quote Optional. Specifies a quoted object that is omitted from pattern matching.
ignoreCase Argument: Yes/No Optional. Set this parameter to “Yes” to ignore the case of a pattern. Set this parameter to “No” to treat a pattern as case-sensitive.
Note: At least one of the following parameters must be included for the Split transform to be valid:
  • start after
  • ending at
  • ending before
  • starting from
  • on

Example
A user wants to split the data from the following column into multiple columns:


Case 1
Description: Split the column around the comma delimiter.
Builder:
Transformation Column On pattern Number of matches Ignore matches between Ignore case
split column1 on: `,` 5 No
Preview of transform:

Result dataset
:


Case 2
Description: Split the column around the comma delimiter, ignoring instances of that delimiter contained between double quotes.
Builder:
Transformation Column On pattern Number of matches Ignore matches between Ignore case
split column1 on: `,` 2 " No
Preview of transform:

Result dataset
: