See How Data Engineering Gets Done on Our Do-It-Yourself Data Webcast Series

Start Free

Speed up your data preparation with Trifacta

Free Sign Up
Summer of SQL

A Q&A Series with Joe Hellerstein

See why SQL is Back
 

Blog

Get the latest insights on data engineering

Posts about Technology

Making the Most of Your BigQuery Investments for Scalable Data Engineering Pipeline

When we released BigQuery Pushdown for Dataprep on Google Cloud back in April, we knew that it was a highly anticipated ELT (Extract Load & Transform) feature that would help both design time and processing time. However, we did not expect it to be adopted so quickly. Our internal benchmark of 20x job acceleration was […]

Bertrand Cariou  |  October 11, 2021

Introducing the Trifacta Python SDK

Background In recent years, Python has become one of the most popular object-oriented programming languages. Whether you are a beginner or an experienced programmer, Python’s simple, easy-to-learn syntax enables quick readability and integration with heterogeneous systems. This simple method of programming makes Python very attractive for scripting as well as connecting different components of software […]

Shyam Srinivasan  |  September 14, 2021

Back to SQL: Data Engineering

As part of growing our massive new Data Science program at Berkeley, it became clear that we needed to target a class specifically for Data Engineering. The goals of Data Engineering are different than Software Engineering. So it was interesting to think through this curriculum and how we would teach it differently than our established database classes.

In this new approach, we ended up emphasizing four steps to SQL for Data Engineering that are atypical of a traditional databases class: data quality, data reshaping, “spreadsheet tasks,” and data pipeline testing.

Joe Hellerstein  |  September 7, 2021

Transformation: Next Level SQL

When we use SQL for Transformation—the “T” in ELT—the focus changes. In this case, we’re taking many messy and disparate tables and manipulating them into a more usable or common form. To take our example from before, we may be extracting and loading sales data from 17 electronics chains that sold the phones, and our job in SQL is to write transformation queries that integrate that data together.

Joe Hellerstein  |  August 30, 2021

SQL Pipelines and ELT

ELT is increasingly attractive these days. Modern data warehouses are flexible and increasingly cost-effective, allowing us to store large volumes of data—even messy data that includes volumes of text and images. In this environment, transformations occur in the data warehouse, where the native language is SQL. 

Joe Hellerstein  |  August 23, 2021

Follow Trifacta on Facebook, LinkedIn and Twitter.


Summer of SQL: Why It’s Back

For the first decades of the Millenium, it seemed like the Java-centric approach was the "hot new thing," but SQL has been roaring back. Today, SQL seems to be the focus of every data engineering conversation and popping back up on billboards in Silicon Valley. 

The comparison of the two "shops" inevitably leads to the question: which is better? There are pros and cons to emphasizing one or the other. 

Joe Hellerstein  |  August 16, 2021

Tracking Dataprep Metadata and Profile Results with Google Cloud Data Catalog

Maintain BigQuery data lineage by enriching Google Cloud Data Catalog tags with Dataprep metadata and profiling results Cataloging Dataprep Pipelines Google Cloud Data Catalog is the defacto metadata cataloging solution for your analytics initiatives on Google Cloud. Data Catalog natively and automatically captures BigQuery datasets, tables, and views, which gives you visibility into your data […]

Victor Coustenoble  |  May 26, 2021

Data Preparation for the Lakehouse

The Lakehouse represents a new way of implementing a data architecture. It combines the best benefits of data warehouse and data lake architectures. In particular, a Lakehouse combines the high performance and ease of use of a traditional data warehouse with the flexibility and low cost of a data lake. However, an organization seeking to […]

Vijay Balasubramaniam  |  May 25, 2021

It Takes a Village to Raise a Cloud Analytics Platform

As a person passionate about technology and working to help customers better deliver on their mission goals, I spend a lot of time thinking about patterns across the myriad of customers I’m fortunate enough to meet and work with on projects. I’m continually learning, assessing and working to understand things and as I come across […]

Brian Shealey  |  May 13, 2021

The Road to Data Preparation

Over the course of my career, I’ve used my fair share of technologies to clean and transform data. I enjoy coding and love SQL. I built ETL pipelines for 10 years. I still use Excel (and Google Sheets) quite often in my product marketing role at Trifacta.  When I first discovered data preparation technologies, I […]

Bertrand Cariou  |  May 7, 2021

Be a part of our internationally growing team.


Join The Team

Trifacta’s Partner Databricks Announces GA Launch on Google Cloud

Today, our partner Databricks announced their GA launch on Google Cloud. We are very excited to have Databricks join Dataprep by Trifacta on Google Cloud Platform Marketplace. This new service will provide a simple, open lakehouse platform for data engineering, data science, analytics, and machine learning with tight integrations to Google Cloud’s analytics solutions. For […]

Matt Derda  |  May 4, 2021

The Different Approaches to “T” in ELT and What’s Required to Drive Mass Adoption

Much has been written about the shift from ETL to ELT and how ELT enables superior speed and agility for modern analytics. One important move to support this speed and agility is creating a workflow that enables data transformation to be exploratory and iterative. Preparing data for analysis requires an iterative loop of forming and […]

Sean Kandel  |  January 15, 2021

What Is ETL? ETL vs. ELT vs. Data Wrangling in the Cloud

Is ETL dead? Did ELT take over or is something new taking its place? It’s a question that has come up a lot in recent years as organizations modernize their analytics infrastructure. Huge shifts are underfoot in the analytics landscape and it isn’t always clear where this change leaves ETL. The short answer? No, ETL […]

Will Davis  |  December 21, 2020

Google Sheets: Data Validation Tips & Tricks

Google Sheets is one of the most widely-used spreadsheet tools. Still, many of its best features go undiscovered. Let’s take a closer look at how to do data validation in Google Sheets, which is commonly used to build drop-down lists.  Why data validation matters Data validation is like the analytic version of copyediting. As much […]

Bertrand Cariou  |  December 13, 2020

Easily Publish to Data Warehouses with New Rename Functions in Trifacta

Chances are you’re having to work with several different databases and data warehouses in your analytics stack. It just is what it is today. In order to get an accurate picture in your reporting you have to use everything. However, working with these different database can be like, well this: When publishing tables in different […]

Nate Vaziri  |  November 24, 2020

How to Automatically Deploy a Google Cloud Dataprep Pipeline Between Workspaces

This article explains how to use Cloud Composer to automate Cloud Dataprep flow migration between two workspaces. This process can be leveraged for your Cloud Data Warehouse project to move from development, test, and production following what is known as Continuous Integration and Continuous Delivery (CI/CD) pipeline in agile development. At a high level, this […]

Connor Carreras  |  November 12, 2020

Data Preparation Best Practices for Snowflake Data Warehouses

Snowflake is a platform known for their separation of storage and compute, which makes scaling data more efficient. However, to get the most value from your investment in Snowflake’s Cloud Data Warehouse, your organization must break through the biggest bottleneck to analytics and AI: data preparation. Here are five data preparation best practices your organization […]

David McNamara  |  November 4, 2020

How to Change Date Format in Excel

When you enter a date into Microsoft Excel, the program will format it according to the default date settings. For example, if you want to enter the date February 6, 2020, the date could appear as 6-Feb, February 6, 2020, 6 February, or 02/06/2020, all depending on your settings. You may find that if you […]

Bertrand Cariou  |  November 2, 2020