Organizations have long been fixated on a complex, data architectural challenge—how can data silos be eliminated? One answer came in the form of centralizing all data into a data “lake,” thereby giving users a singular place to hunt for data instead of various data warehouses. 

Yet several key issues prevented the data lake from becoming a roaring success. One, it presented a significant challenge for organizations to maintain. And two, perhaps most importantly, not everything could be dumped into the lake as envisioned. Inevitably, organizations found themselves safeguarding sensitive data in on-premise storage, creating data warehouses for departmental use, generating data in SaaS applications or directly receiving data from external sources. 

The result? Even with a data lake, most organizations found that they still suffered from the siloed data problem. Users couldn’t locate all the data available to them in one central location, and the quality of organizational analytics suffered for it. 

Recently, new data management techniques have emerged to solve for this lingering problem. One such technique is called the “data hub.”

What is a Data Hub? 

A data hub operates off of a spoke-and-hub model; that is, all sorts of data sources within the organization—data lakes, data warehouses, SaaS applications, etc.—are connected to a central “hub.” In this way, a data hub embraces the diversity of data sources that an organization will inevitably accumulate, but grounds them with a singular platform. 

However, a data hub isn’t just the latest version of a central physical repository; unlike data lakes or data warehouses, a data hub isn’t intended to store data. Instead, think of a data hub more like a gateway by which data moves, either virtually (through search) or temporarily physically as it passes from one application to the next.

In that way, a data hub acts as both a map and a transport system for the wide-ranging data sources throughout the organization. It allows users to easily search for, access, and process data, no matter the type of data or if that data is in the cloud or on-premises.

What are the benefits of a data hub?

  1. Counteract data silos with a single, unified interface
    The most obvious benefit of a data hub is its ability to break down data silos by linking data through a singular data hub. Gone are the days of searching for data through various sources; instead, a data hub gives users the data they need at their fingertips.
  2. Increased data access and improved analytics
    Consequently, having a singular vantage point for all data throughout the organization increases data access, thereby improving analytics and innovation.
  3. Reduced technical debt
    Some organizations have tried to solve for the growing and disconnected number of systems and applications through point-to-point integrations. However, that strategy is severely limited in its ability to scale; most often, organizations find that the demand for point-to-point integrations grows exponentially and only increases their technical debt. A data hub sidesteps this problem by providing a central platform that all applications connect to, instead of trying to connect all systems with each other
  4. Centralized governance
    A data hub can act as a governance body by allowing organizations to easily control who is granted access to which data based upon their profile and permissions.

What’s next?

While a data hub provides many benefits, it’s important to remember that it’s not an “end-all, be-all” solution. As you move forward in your research and implementation of a data hub solution, consider other essential elements to make your data hub successful. 

For example, data preparation and the development of automated, reliable data pipelines must be a factor in your data hub journey. Without the ability to prepare or schedule data pipelines for specific analytic needs, users may find that the data hub is just that—a hub for data, but not a starting point for analytics. 

Additionally, consider how a data hub may function in a broader data fabric strategy, which more deeply depends on metadata as the “fabric” between all data sources. Both strategies have promising roles within the future of data architecture and will continue to make an impact on a variety of data-driven organizations. 

If you liked this post, read more about our thoughts on the latest trends in data engineering on the Trifacta blog.