Hello and welcome to a new year and a new set of exciting capabilities from Trifacta with our latest 8.11 release. Increased flexibility to choose dataset sample sizes, pushdown optimization with BigQuery with sampling jobs, publish CSV files even with mismatched values, new connectors, and more – we have a lot to share with you. Let’s go.
Increased flexibility with larger sample sizes for datasets
Continuing with the theme of flexibility with data engineering, the ability to choose the size of your dataset is critical to ensure all anomalies are identified and corrected before using the data. Trifacta enables the design of the data transformation steps through a ‘recipe’ based on a dataset sample. Last year, we introduced the capability to change the size of your sample dataset with an easy-to-use slider up to 10 MB in size. We now allow you to load a larger sample of up to 40 MB in size that will help identify additional anomalies if required. The default sample size is 10 MB. Adjusting the dataset sample size will also help you improve the performance of Trifacta in your browser and address any low-memory conditions.
Click here to learn more about the flexibility of dataset sample sizes.
Performance gains of up to 30x with Sampling using Pushdown on BigQuery
As part of our vision towards data modernization and enabling the modern data stack, we have been making strides towards intelligent and efficient approaches for data transformations. This has helped our customers adopt paradigms such as ELT and enabled data-driven decisions for the desired business outcomes. During this journey, we launched significant capabilities towards Pushdown Optimization with leading cloud data warehouses (CDWs) such as Google BigQuery. With this approach, data transformation steps that are called recipes within Trifacta are automatically translated into executable SQL queries and are executed within the data warehouse to leverage the scale and power of the CDW. We support use cases when the source table is within BigQuery, and when you have files within GCS in the CSV and JSON formats. We are happy to hear from our customers who have reported performance gains of up to 30x with our Pushdown capabilities on BigQuery.
We now extend this capability to sampling jobs from Trifacta. You can now experience a faster and a better design-time experience with Trifacta and BigQuery with the support for the execution of sampling jobs. Trifacta supports a variety of sampling methods and BigQuery execution is now supported on Random, Head, Filter, and Anomaly sampling methods.
Click here about this newest capability with BigQuery execution.
Flexibility while publishing CSV files to include mismatches
Being flexible with data engineering tasks such as publishing files is a key aspect of Trifacta’s value to our target audience of engineers, analysts, and anyone who works with data. You can now publish your CSV files even if there are mismatched values in certain columns. Previously, when a CSV file was published, mismatched values for a data type in a column were written as null values. This could potentially lead to loss of data that could impact downstream applications. With the 8.11 release, mismatched values are now written as string values for CSV files. To enable this, we have an option on the user interface while publishing to include or exclude mismatched values, providing increased flexibility. This new capability is enabled by default for new flows and CSV publishing actions.
For existing flows with CSV files, the previous behavior of writing mismatched values as null values is maintained for existing pipelines. Click here to learn more about this new capability.
New connectors with Trifacta 8.11
We continue our journey to help you connect to any data source, enabling additional use cases. With Trifacta 8.11, we support the following new connectors.
- Asana: A web and mobile application to help teams organize, track, and manage their work.
- Exact Online: An online business software for business owners and accountants.
- Facebook Ads: Paid messages for businesses to showcase on the Facebook Newsfeed.
- Jira by Atlassian: A software application used for project management.
- Pinterest: A visual discovery engine for finding ideas and inspirations for home and work.
- QuickBooks Online: A cloud-based software for accounting, payroll, and related areas.
- Trino: A SQL query engine to query large data sets across heterogeneous sources.
You can learn all about our connectivity updates here.
We’d love for you to try these new capabilities in your data engineering journey. If you have not used Trifacta before, it’s a great time to sign up for a free trial today and join us. Onwards and upwards till next time.