Data School

Presenting The Data School, our online resource for people who work with data

Learn More
 
All Blog Posts

What Is Data Modeling and Why Does It Matter?

October 22, 2020

Data doesn’t exist in a vacuum; understanding the relational nature of data is key to understanding its value. For example, what good would customer IDs be to a product team if those IDs didn’t coincide with the specific products that customers bought? Or, how would a marketing team conduct pricing analysis without being able to conjure data that connects price points to certain products? 

The process of assigning relational rules to data, such as those mentioned above, is called data modeling. Though data modeling itself is a highly technical process, it should evolve business stakeholders to help define these business requirements. Undertaking data modeling requires an organization to pin down the inner workings of its business operations in order to best define the data—and the necessary structure of that data—that fuels those operations. 

Read on to learn more specifics about data modeling and its benefits to an organization. 

What is data modeling?

At a high level, data modeling is the process of visualizing and representing data for storage in a data warehouse. The model is a conceptual representation of the data, the relationships between data, and the rules. The modeling itself can include diagrams, symbols, or text to represent data and the way that it interrelates. Because of the structure that data modeling imposes upon data, the process of data modeling subsequently increases consistency in naming, rules, semantics, and security, while also improving data analytics. 

data modeling

Types of data models

There are three main types of data models that organizations use. Each type of data model serves a different purpose and has its own advantages. 

Conceptual data model

A conceptual data model is a visual representation of database concepts and the relationships between them. Typically, a conceptual data model won’t include details of the database itself but instead focuses on establishing entities, characteristics of an entity, and relationships between them. These data models are created for a business audience, especially key business stakeholders.  

conceptual data modeling

Logical data model

A logical data model is often the next step after conceptual data modeling. This data model further defines the structure of the data entities and sets the relationships between them. The attributes of each data entity are clearly defined. Usually, a logical data model is used for a specific project since the project would have certain requirements for the structure. The model can still be integrated into other logical models to provide a better understanding of the scope. For this level of data modeling, the normalization process is applied to 3NF, but no secondary or primary key is needed. 

logical data modeling

Physical data model 

A physical data model is used for database-specific modeling. Just like with the logical model, a physical model is used for a specific project but can be integrated with other physical models for a comprehensive view. The model goes into more detail with column keys, restraints, and primary and foreign keys. The columns will include exact types and attributes in this model, and the data should be normalized as well. A physical model designs the internal schema. 

physical data model

Why use data models?

Data modeling might seem like an abstract process, far removed from the data analytics projects that drive concrete value for the organization. But data modeling is necessary foundational work that not only allows data to more easily be stored in a database but also positively impacts data analytics, too.

These are some of the key benefits of data modeling and why organizations will continue to use data models: 

  • Higher quality data.
    The visual depiction of requirements and business rules allow developers to foresee what could become large-scale data corruption before it happens. Plus, data models allow developers to define rules that monitor data quality, which reduces the chance of errors.
  • Increased internal communication about data and data processes.
    Creating data models is a forcing function for the business to define how data is generated and moved throughout applications.
  • Reduced development and maintenance costs.
    Because data modeling surfaces errors and inconsistencies early on in the process, they are far easier and cheaper to correct.
  • Improved performance.
    An organized database is a more efficiently operated one; data modeling prevents the schema from endless searching and returns results faster. 

Data quality: Enemy #1 of data models

In the process of creating data models, one of the biggest challenges is addressing all possible data quality issues. Though the end result—higher-quality data—is better than the alternative, it can be an uphill battle trying to predict and prevent every scenario. 

What makes this more difficult is that the definition of “good” data will often vary from project to project, even for the same data. An in-depth understanding of the use case will ultimately determine the data quality issues that matter most and what “good enough” looks like. What data is most essential to the success of the project? What level of quality is really necessary? How significant are the risks of bad data? 

Organizations can spend untold resources in an attempt to achieve pristine data quality, but that amount of investment vs. how much it will move the needle may not add up. For example, some use cases may necessitate remediating every null value; others may not. It’s important for organizations to know upfront what is important to the use case so they can maximize return on effort (RoE) and define clear data SLAs.

Data modeling and data preparation

To solve the challenge of data quality, organizations are increasingly involving business users. Instead of burdening developers with the task of building data models and remedying all data quality issues, business users now have the means to prepare data themselves for specific analytic initiatives by way of modern data preparation platforms. 

For one, this is a more efficient approach—instead of a small task force chasing down issues of data quality, there are more eyes on the data—but it also leads to better curation for the end analysis. IT will still curate the best stuff, make sure it is sanctioned and re-used (this ensures a single version of truth and increases efficiency). But, with business context and ownership over the finishing steps in cleansing and data preparation, these users can ultimately decide what’s acceptable, what needs refining, and when to move on to analysis.

Trifacta is the industry-leading data preparation platform. Trifacta has been routinely named the leading data preparation platform. Its machine-learning powered platform acts as an invisible hand during the data preparation process, guiding users towards the best possible transformation. Its visual interface automatically surfaces errors, outliers, and missing data, and it allows users to quickly edit or redo any transformation. Finally, it integrates with essential customer applications and can pull in data from anywhere within the organization. 

To see more of how Trifacta can support more efficient data modeling, request a demo today.