Our software releases and updates come fast and furious. We’re excited to share the latest capabilities as part of the Trifacta 9.1 release. As always, we cover a wide range of features related to data engineering. Let’s dive into them.
General Availability of SSH Tunneling Connectivity Support
Hybrid architectures spanning the cloud and on-premises networks are common, especially for large enterprises with applications residing both on-premises and in the cloud. To support and strengthen hybrid architectures, we’re excited to announce the General Availability of connectivity using SSH Tunneling. This expands on our previous limited preview announcement of this capability with our 8.10 release last year.
To help connect to hosts such as database servers deployed within a private network, you can now enable SSH Tunneling within Trifacta. SSH Tunneling offers a secure solution where the SSH ports are open for access from public networks whenever needed. With this solution, you don’t need to whitelist specific IP addresses or open application ports to access these hosts. SSH Tunneling is a secure and widely accepted technology where all data is encrypted during transit, thereby maintaining a secure transport session.
You can learn more from our technical documentation.
Higher Data Accuracy with Schema Change Detection
Schema refers to the sequence and data types in a dataset. It is common for schemas to change over time, causing broken transformation steps or recipes that can cause data corruption with downstream applications. This new capability enables you to monitor schema changes in your dataset and helps you identify data sources where the schema has changed. Further, the job fails when this occurs. This is done by comparing the current schema of the data source and the schema that was previously stored in the database.
Schema changes are detected if columns are added, removed, or moved. You can configure the jobs to fail if schema changes are detected. This is supported for JDBC, BigQuery, AVRO, and Parquet file formats, with support for additional formats coming in the upcoming releases. Learn more about this new capability with our community article here.
Better Visibility with Sample Job IDs
With Trifacta, you can quickly start working with your dataset. This is accomplished by automatically generating a sample using the first set of rows of your dataset. Sample jobs are independent executions and you can always specify the type of sample you wish to create and initiate the job to create the sample. The sampling jobs run in the background.
Samples have unique IDs which previously could be accessed on the job history page. You can now visualize these IDs in the familiar Transformer view, helping you identify the samples easily. This also helps with better visibility showing all the samples along with their IDs on a single screen. The details of the sample including job ID can be accessed by clicking on a particular sample name.
You can learn all about sampling here.
Increased flexibility with dataset configurations
Datasets are the foundation for all data pipelines. Datasets often come from different sources containing extraneous columns, complex column names, or other inconsistencies leading to incorrect inference and inaccurate results. You can now overcome these hurdles by updating and preserving metadata configuration that can be reused consistently and applied each time the dataset is used in a new flow.
With this new capability, you can search for and select a subset of columns to be included or omitted out of flows whenever the dataset is used. You can manually rename columns as part of the reusable dataset and override system inferred Trifacta data types for ongoing saving and reuse. It is currently supported for relational, delimited, and schema files such as Parquet and Avro.
Learn more about configuration settings here.
Additional security with Customer Managed Encryption Keys (CMEK) for Dataflow
Customer Managed Encryption Keys are created, managed, and stored within the cloud key management service. These keys can be applied to individual objects. When used, data that is written for the objects that are scoped by the keys are automatically encrypted when written and decrypted when read.
Now, you can have user-specific CMEKs when using Google Cloud Dataflow. This will ensure that any intermediate files created by Dataflow will use the CMEKs. This capability is currently under Private Preview and please contact Trifacta support if you would like to use this feature.
New connector with Trifacta 9.1
We now have a new connector in Early Preview. We support connectors to Instagram Ads which is a method of paying for post-sponsored content on the Instagram platform. You can learn all about our connectivity updates here.
It’s never too late to sign up for a free trial with Trifacta. Join us today on our journey to the cloud with Trifacta by Alteryx.