The Different Approaches to “T” in ELT and What’s Required to Drive Mass Adoption
Much has been written about the shift from ETL to ELT and how ELT enables superior speed and agility for modern analytics. One important move to support this speed and agility is creating a workflow that enables data transformation to be exploratory and iterative. Defining an analysis requires an iterative loop of forming and testing […]
Google Sheets: Data Validation Tips & Tricks
Google Sheets is one of the most widely-used spreadsheet tools. Still, many of its best features go undiscovered. Let’s take a closer look at how to do data validation in Google Sheets, which is commonly used to build drop-down lists. Why data validation matters Data validation is like the analytic version of copyediting. As much […]
Easily Publish to Data Warehouses with New Rename Functions in Trifacta
Chances are you’re having to work with several different databases and data warehouses in your analytics stack. It just is what it is today. In order to get an accurate picture in your reporting you have to use everything. However, working with these different database can be like, well this: When publishing tables in different […]
How to Automatically Deploy a Google Cloud Dataprep Pipeline Between Workspaces
This article explains how to use Cloud Composer to automate Cloud Dataprep flow migration between two workspaces. This process can be leveraged for your Cloud Data Warehouse project to move from development, test, and production following what is known as Continuous Integration and Continuous Delivery (CI/CD) pipeline in agile development. At a high level, this […]
Data Preparation Best Practices for Snowflake Data Warehouses
Snowflake is a platform known for their separation of storage and compute, which makes scaling data more efficient. However, to get the most value from your investment in Snowflake’s Cloud Data Warehouse, your organization must break through the biggest bottleneck to analytics and AI: data preparation. Here are five data preparation best practices your organization […]
How to Change Date Format in Excel
When you enter a date into Microsoft Excel, the program will format it according to the default date settings. For example, if you want to enter the date February 6, 2020, the date could appear as 6-Feb, February 6, 2020, 6 February, or 02/06/2020, all depending on your settings. You may find that if you […]
Publishing Data to Snowflake Using Trifacta Data Quality Rules
When publishing data to cloud data warehouse Snowflake for analytic use, data quality is of the utmost importance. Improperly curated data threatens the validity of the end analysis. Data Quality Rules in Trifacta accelerates the process of ensuring data quality by automatically generating a list of data quality rules for users to select from and […]
How to Use Trifacta and Snowflake to Prepare Data for Home Price & Rental Analysis
If you are using Snowflake as your cloud analytics platform, Trifacta can help accelerate the process of data preparation and cleaning. In this demo, we will demonstrate how to use Trifacta to accelerate the process of preparing data before publishing the results to cloud data warehouse Snowflake. Specifically, we will showcase finding the price-to-rent ratio […]
What Is a Customer Data Platform? A Guide to CDPs
Today’s customers leave digital footprints behind just about every purchase. Any given buyer may start by searching on Google, visiting an eCommerce store, cross-referencing on Amazon or Google Shopping, reviewing the company’s social media channels—and several times back again—before finally making a purchase. Gathering this kind of data is certainly helpful. But being able to […]
Understanding Automated Cloud Data Warehouse with BigQuery and Looker
This blog illustrates how the combination of Cloud Dataprep, Looker, and BigQuery fulfills the three necessary elements for a scalable, self-service data warehouse a.k.a. self-service analytics. What is self-service analytics? Self-service analytics empower the everyday business user to create their own end-to-end analytics solution—that is, accessing data, preparing and cleansing it for use, and generating […]
Predicting COVID-19 Cases with Machine Learning and Trifacta
In the fight against COVID-19, one of the best weapons at our disposal is data. But interpreting COVID-19 data isn’t always cut and dry. There’s no blueprint for a novel virus; instead, the global scientific community has had to sift through complex and ever-evolving data and, bit by bit, begin to assemble an understanding of […]
How to Extend Cloud Dataprep by Using BigQuery Javascript UDFs
Since Trifacta is a data company, we try to be as data-driven as possible. This means that product usage data analysis informs many of our product, sales, and marketing decisions. Our team has built our usage data pipeline entirely in GCP, so we can use both our own technology (Google Cloud Dataprep) and native GCP […]
Snowflake Software and Trifacta
The cloud computing market is often boiled down to a race between the “Big Three” cloud providers—Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). But while these platforms may anchor an organization’s cloud strategy, they are far from the full picture. The cloud computing market is made up of numerous technologies, products, […]
Cleaning and Preparing Data from Snowflake Databases
What is Snowflake database software? Snowflake is a cloud data warehouse that provides various layers for cloud services, query processing, and database storage. Snowflake’s unique architecture provides many advantages over traditional data warehouses given its infrastructure as a service, allowing for agile and scalable storage and processing of data in the cloud. Additionally, Snowflake’s architecture […]
Snowflake Data Prep for Data Scientists, Data Analysts and Data Engineers
Snowflake’s unique architecture allows organizations to store a wider variety of data formats and data types, including most SQL data types. The diversity of Snowflake databases and Snowflake types of data opens up new possibilities for creating insight-rich data using a data preparation platform like Trifacta. Whether you use Snowflake as a cloud data warehouse […]
Data Preparation in an AWS Data Lake
Before we jump into the definition of an AWS data lake, let’s review why data lakes are important in the first place. A data lake is a central repository capable of storing both structured and unstructured data. The concept of a data lake is only about 10 years old, but it has already reengineered the […]
Data Enrichment in the Cloud – Why Data Marketplaces need Data Prep
These days, we can’t get enough of it–Cloud. Everyone is moving to, or, more precisely, has moved to the cloud for some portion, if not all, of their data analytics. And for good reason. When you need spare compute capacity to do a one-time analytical analysis, you just spin up some servers, run your analytics […]
Leverage Cloud Functions and APIs to Monitor Cloud Dataprep Jobs Status in a Google Sheet
If you manage a data and analytics pipeline in Google Cloud, you may want to monitor it and obtain a comprehensive view of the end-to-end analytics process in order to react quickly when something breaks. This article shows you how you can capture Cloud Dataprep jobs status via APIs leveraging Cloud Functions. We then input […]