See How Data Engineering Gets Done on Our Do-It-Yourself Data Webcast Series

Start Free

Speed up your data preparation with Trifacta

Free Sign Up
Summer of SQL

A Q&A Series with Joe Hellerstein

See why SQL is Back
 
All Blog Posts

Pushing Data Engineering Tasks to Go Faster with New Pushdown Optimization

March 23, 2021

In the modern data stack, the hard work of data transformation—the “T” in “ELT”—is pushed into powerful cloud data warehouses. But because raw data is everywhere—in files, on-premises relational databases, NoSql Databases, and SaaS applications—it’s not always possible to push the full data transformation workload into your cloud data warehouse. 

You need flexibility that doesn’t sacrifice performance. 

Introducing New Pushdown Optimization Capability

That’s why Trifacta has developed a new product feature: pushdown optimization. We designed it to harness the power of your cloud data warehouse to do all of your self-service data cleansing and transformation through SQL-based ELT. 

Workloads can now be pushed down to the underlying source, wherever possible, while still leveraging the large-scale computing power of cloud data warehouses. This feature is available for Snowflake, Google BigQuery, Amazon Redshift, Oracle DB, PostgreSQL, MS SQL Server, and Azure Synapse.

The result? Data engineering tasks run faster inside your cloud data warehouse. And business users, who are always hungry for more data, have a natural low-code/no-code ELT companion to get data ready for BI initiatives faster than ever.

Let’s look a little more closely at two key capabilities of this new pushdown optimization feature: filter pushdown and column pruning.  

Filter Pushdown

If you want to read data from a relational database and apply filters while wrangling data in Trifacta, the optimum engine for executing those filters will probably be the underlying relational database, and not the core Trifacta data processing engine. Trifacta will now intelligently pushdown these filters directly to the database, which reduces the amount of data moving through the recipe and removes the redundant steps from the wrangle recipe. “Filter pushdown” is also known as “early selection” in database processing parlance.

Column Pruning

Similarly, if you want to wrangle only a subset of the columns of the original data, you don’t need to ingest all the columns into Trifacta. Now you can prune the columns you don’t need and ingest only the relevant data, which greatly reduces the ingest load and improves execution performance. “Column pruning” is also known as “early projection” in database processing parlance.

More Pushdown Optimizations Coming Soon

Both column pruning and filter pushdown are now available in the Trifacta platform and are turned on by default for any new flows you create. Executing the rest of your flow happens both on source, wherever possible, and on the Trifacta execution engine, as a hybrid execution. 

Our next blog in this series will focus on intelligent execution when both the source and destination is the cloud data warehouse, full compilation to the cloud data warehouse, and how this enables your move from ETL to ELT. We’re hard at work developing more pushdown optimizations just as fast as we can. Watch this space for new optimizations—like join, union, aggregate, and expression pushdown—coming soon. 

We’ll also be announcing more on this topic at our upcoming Wrangle Summit. Register for FREE here!