Join us on April 7-9, 2021

The first industry event focused on data engineering

Register Today
 
All Blog Posts

What Is a Data Stack and How Does It Impact Analytics?

December 16, 2020

We hear a lot about organizations undergoing “data modernization” in order to become more data-driven. Essentially what that means is that these organizations have recognized that legacy data tools aren’t very good at solving modern data problems. They’re in the process of moving data out of legacy mainframe databases and, at the same time, replacing legacy systems with an updated solution—one that is commonly referred to as a “modern data stack.” 

So what does a modern data stack look like? And how does it deliver on its promise of increasing and improving analytics? Read on to learn more. 

What is a data stack?

Data has been called “the new oil of the digital economy” in that it’s one of the most valuable (and not yet fully tapped) assets that an organization has. But pieces of data by themselves don’t have much worth. Data must first be compiled, organized, cleaned, and put to use in an analytics project in order to generate value. The combination of technologies that usher data through those steps is what makes up a data stack.  

A good analogy for stack data structure is cooking. Before you end up with a meal, you must first source and store ingredients in your kitchen, prepare your ingredients, put everything in the oven, until, finally, you have a ready-to-eat-meal. Having a central kitchen is essential. Were the ingredients separated in different places, it would be hard to predict what kind of meal you could produce; side-by-side, it’s easy to envision several recipes. 

Much like a kitchen, a cloud data warehouse is the center point of stack data structure. Under traditional data warehouses, data was segregated and difficult to access. Cloud data warehouses are much more friendly to the self-service technologies that have become the norm among analysts, and often reduce costs and improve performance with flexible storage and elasticity. 

To see how marketing agency Callahan built a data stack around a cloud data warehouse, watch its demonstration here.

Let’s take a closer look at the four key components of a data stack.

  1. Loading
    Technologies that fall under this category are responsible for moving data from one place to another. A great example of a vendor that covers this part of the stack is Fivetran.
  2. Warehousing
    These are the technologies that allow organizations to store all their data in one place. Cloud-based data warehouses are the basis of modern data stacks; examples include Google BigQuery, Amazon Redshift, Snowflake, and Databricks.
  3. Transforming
    This is the stage that turns “raw” data into “refined” data—in other words, makes data usable for analytics. Most organizations will use a “data preparation platform” for this stage (more on that below). The industry leader in data preparation is Trifacta.
  4. Analytic Use
    At this point, organizations begin to derive meaningful insights from their data by funneling it into machine learning models, serving up to stakeholders as reports or visualizations, or using it as the basis of data applications. Examples of analytics vendors abound; a few of common vendors include Looker, Google Data Studio, Tableau, Amazon SageMaker (ML models). 

The rise of ELT

Ushering data throughout an organization hasn’t always followed this order. Data modernization has not only called for a new stack of technologies, but also a new way of building data pipelines. 

Pre-cloud data warehouses, most organizations relied upon an ETL process. That is, extract data from data systems and external sources, transform it into a format for storage, and load it into databases. This process made sense when a small team of developers controlled the organization’s data. Now, there’s far too many teams and users that need data for a small group to handle the entire process of preparing data and serving it up to them. On top of that, shoehorning modern, complex data types into one format for storage isn’t efficient or conducive to data exploration. 

An ELT process like the one outlined above—where organizations have the flexibility to load data into warehouses before it is transformed and then allow business users to transform it themselves—is a much more efficient approach.

The main advantages of ELT included: 

  • Reduced time — An ETL process requires the use of a staging area and system, which means extra time to load data; ELT does not.
  • Increased usability — Business users can own business logic instead of a small IT team using Java, Python, Scala etc. to transform data.
  • More cost-effective — Using SaaS solutions, an ELT data structure stack can scale up or down to the needs of an organization; ETL was designed for large organizations only.
  • Improved analytics — Under ELT, business users can apply their unique business context to the data, which often leads to better results.

Why the “T” in a data stack is so important

We’ve been talking a lot about the “T” in the data stack—the process of transforming data for analytic use. Let’s take a closer look at why this stage is so important. 

First, let’s go back to our cooking metaphor. A good analogy for transforming data is food preparation. The work it takes to move from raw ingredients to a complete meal is a critical activity, and one that largely dictates the quality of your meal. While there are some food preparation tasks that can be applied to all ingredients (washing, removing stems, etc.), by and large, each ingredient will be prepared differently when cooking different meals. Data works similarly. 

There is no “one-size-fits-all” data preparation. Each analytic project will demand different data preparation steps and have different data quality standards. But the commonality in all data preparation jobs is that no matter how the data is transformed, that outcome will be the very foundation of the final analysis—for better or for worse. Performed correctly, data preparation can lead to deeper insights, even beyond the intended scope of analysis. Each step in the data preparation process exposes new potential ways that the data might be “re-wrangled,” all driving towards the goal of generating the most robust final analysis.

While IT often maintains responsibility of large-scale data transformation tasks to ensure a single version of truth, it’s business users that need to own the finishing steps in cleansing and data preparation. Having the right business context allows these users to ultimately decide what’s acceptable, what needs refining, and when to move on to analysis. 

A modern, ELT data stack that includes a data preparation platform like Trifacta has allowed business users to assume this responsibility. And it has radically changed the way that analytics is performed across an organization. There is less friction in obtaining data, more of the right eyes on how it should be transformed, and increased room for exploration in how the analysis could be changed. 

Trifacta for the modern data stack

Trifacta is widely recognized as the industry leader in data preparation. In conjunction with other technologies that make up the modern data stack, organizations are building automated data pipelines that dramatically improve the efficiency of analytics. 

To see how marketing agency Callahan built a data stack around a cloud data warehouse, watch its demonstration here. 

Trifacta’s machine-learning powered platform acts as an invisible hand during the data preparation process, guiding users toward the best possible transformation. Its visual interface automatically surfaces errors, outliers, and missing data, and it allows users to quickly edit or redo any transformation. Finally, it integrates with modern data stack technologies for seamless, automated data pipelines. 

Learn why organizations are incorporating Trifacta as a key part of their data stack strategy today. Schedule a free demo from our team or get started right away with Trifacta on the platform of your choice.