Start Free

Speed up your data preparation with Trifacta

Free Sign Up
Wrangle Summit 2021 On Demand

You can still experience the best people, ideas and technology in data engineering, all in one place

Get All-Access Pass
 

Blog

Get the latest insights on data engineering

Posts about Technology

Tracking Dataprep Metadata and Profile Results with Google Cloud Data Catalog

Maintain BigQuery data lineage by enriching Google Cloud Data Catalog tags with Dataprep metadata and profiling results Cataloging Dataprep Pipelines Google Cloud Data Catalog is the defacto metadata cataloging solution for your analytics initiatives on Google Cloud. Data Catalog natively and automatically captures BigQuery datasets, tables, and views, which gives you visibility into your data […]

Victor Coustenoble  |  May 26, 2021

Data Preparation for the Lakehouse

The Lakehouse represents a new way of implementing a data architecture. It combines the best benefits of data warehouse and data lake architectures. In particular, a Lakehouse combines the high performance and ease of use of a traditional data warehouse with the flexibility and low cost of a data lake. However, an organization seeking to […]

Vijay Balasubramaniam  |  May 25, 2021

It Takes a Village to Raise a Cloud Analytics Platform

As a person passionate about technology and working to help customers better deliver on their mission goals, I spend a lot of time thinking about patterns across the myriad of customers I’m fortunate enough to meet and work with on projects. I’m continually learning, assessing and working to understand things and as I come across […]

Brian Shealey  |  May 13, 2021

The Road to Data Preparation

Over the course of my career, I’ve used my fair share of technologies to clean and transform data. I enjoy coding and love SQL. I built ETL pipelines for 10 years. I still use Excel (and Google Sheets) quite often in my product marketing role at Trifacta.  When I first discovered data preparation technologies, I […]

Bertrand Cariou  |  May 7, 2021

Trifacta’s Partner Databricks Announces GA Launch on Google Cloud

Today, our partner Databricks announced their GA launch on Google Cloud. We are very excited to have Databricks join Dataprep by Trifacta on Google Cloud Platform Marketplace. This new service will provide a simple, open lakehouse platform for data engineering, data science, analytics, and machine learning with tight integrations to Google Cloud’s analytics solutions. For […]

Matt Derda  |  May 4, 2021

Follow Trifacta on Facebook, LinkedIn and Twitter.


The Different Approaches to “T” in ELT and What’s Required to Drive Mass Adoption

Much has been written about the shift from ETL to ELT and how ELT enables superior speed and agility for modern analytics. One important move to support this speed and agility is creating a workflow that enables data transformation to be exploratory and iterative. Preparing data for analysis requires an iterative loop of forming and […]

Sean Kandel  |  January 15, 2021

What Is ETL? ETL vs. ELT vs. Data Wrangling in the Cloud

Is ETL dead? Did ELT take over or is something new taking its place? It’s a question that has come up a lot in recent years as organizations modernize their analytics infrastructure. Huge shifts are underfoot in the analytics landscape and it isn’t always clear where this change leaves ETL. The short answer? No, ETL […]

Will Davis  |  December 21, 2020

Google Sheets: Data Validation Tips & Tricks

Google Sheets is one of the most widely-used spreadsheet tools. Still, many of its best features go undiscovered. Let’s take a closer look at how to do data validation in Google Sheets, which is commonly used to build drop-down lists.  Why data validation matters Data validation is like the analytic version of copyediting. As much […]

Bertrand Cariou  |  December 13, 2020

Easily Publish to Data Warehouses with New Rename Functions in Trifacta

Chances are you’re having to work with several different databases and data warehouses in your analytics stack. It just is what it is today. In order to get an accurate picture in your reporting you have to use everything. However, working with these different database can be like, well this: When publishing tables in different […]

Nate Vaziri  |  November 24, 2020

How to Automatically Deploy a Google Cloud Dataprep Pipeline Between Workspaces

This article explains how to use Cloud Composer to automate Cloud Dataprep flow migration between two workspaces. This process can be leveraged for your Cloud Data Warehouse project to move from development, test, and production following what is known as Continuous Integration and Continuous Delivery (CI/CD) pipeline in agile development. At a high level, this […]

Connor Carreras  |  November 12, 2020

Be a part of our internationally growing team.


Join The Team

Data Preparation Best Practices for Snowflake Data Warehouses

Snowflake is a platform known for their separation of storage and compute, which makes scaling data more efficient. However, to get the most value from your investment in Snowflake’s Cloud Data Warehouse, your organization must break through the biggest bottleneck to analytics and AI: data preparation. Here are five data preparation best practices your organization […]

David McNamara  |  November 4, 2020

How to Change Date Format in Excel

When you enter a date into Microsoft Excel, the program will format it according to the default date settings. For example, if you want to enter the date February 6, 2020, the date could appear as 6-Feb, February 6, 2020, 6 February, or 02/06/2020, all depending on your settings. You may find that if you […]

Bertrand Cariou  |  November 2, 2020

Publishing Data to Snowflake Using Trifacta Data Quality Rules 

When publishing data to cloud data warehouse Snowflake for analytic use, data quality is of the utmost importance. Improperly curated data threatens the validity of the end analysis.  Data Quality Rules in Trifacta accelerates the process of ensuring data quality by automatically generating a list of data quality rules for users to select from and […]

Matt Derda  |  October 27, 2020

How to Use Trifacta and Snowflake to Prepare Data for Home Price & Rental Analysis

If you are using Snowflake as your cloud analytics platform, Trifacta can help accelerate the process of data preparation and cleaning. In this demo, we will demonstrate how to use Trifacta to accelerate the process of preparing data before publishing the results to cloud data warehouse Snowflake. Specifically, we will showcase finding the price-to-rent ratio […]

Brandon Hoang  |  October 27, 2020

What Is a Customer Data Platform? A Guide to CDPs

Today’s customers leave digital footprints behind just about every purchase. Any given buyer may start by searching on Google, visiting an eCommerce store, cross-referencing on Amazon or Google Shopping, reviewing the company’s social media channels—and several times back again—before finally making a purchase.  Gathering this kind of data is certainly helpful. But being able to […]

Matt Derda  |  October 26, 2020

Understanding Automated Cloud Data Warehouse with BigQuery and Looker

This blog illustrates how the combination of Cloud Dataprep, Looker, and BigQuery fulfills the three necessary elements for a scalable, self-service data warehouse a.k.a. self-service analytics.  What is self-service analytics? Self-service analytics empower the everyday business user to create their own end-to-end analytics solution—that is, accessing data, preparing and cleansing it for use, and generating […]

Bertrand Cariou  |  October 22, 2020

Predicting COVID-19 Cases with Machine Learning and Trifacta

In the fight against COVID-19, one of the best weapons at our disposal is data. But interpreting COVID-19 data isn’t always cut and dry. There’s no blueprint for a novel virus; instead, the global scientific community has had to sift through complex and ever-evolving data and, bit by bit, begin to assemble an understanding of […]

Bertrand Cariou  |  October 14, 2020

How to Extend Cloud Dataprep by Using BigQuery Javascript UDFs

Since Trifacta is a data company, we try to be as data-driven as possible. This means that product usage data analysis informs many of our product, sales, and marketing decisions. Our team has built our usage data pipeline entirely in GCP, so we can use both our own technology (Google Cloud Dataprep) and native GCP […]

Connor Carreras  |  October 12, 2020