Start Free

Speed up your data preparation with Trifacta

Free Sign Up
Moving Analytics to the Cloud?

Survey of 600+ data workers reveals biggest obstacles to AI/ML in the cloud

Get the Report
Schedule a Demo

From Raw to Refined: Mapping Users to Your Data Lake (Part 2)

May 11, 2016

In this two-part series, we’re talking about the Hadoop data lake, both in terms of the necessary components and people involved. Our first post covered the different staging areas of the lake and what they should accomplish. 

Data is flowing into businesses faster than ever. From marketing to customer support to operations, each part of the organization has their own objectives. To accommodate this growing volume and variety of data, many IT organizations are choosing to adopt data lakes. In order to move data through a Hadoop data lake, as explained in part 1, data flows through 4 zones from from landing to production.

However, in order to build a successful Hadoop data lake—one that fosters adoption across the business units—the organization needs to evolve. IT’s role in supporting the Hadoop data lake should shift from implementer and gatekeeper to enabler and trusted resource. To evolve effectively, and create successful Hadoop data lake, organizational roles must be aligned with team capabilities and resources.

Below, we’ve outlined best practices for organizational alignment around the Hadoop data lake:

LANDING ZONE

Why it’s important: The landing zone preserves data in its native format, maintaining data provenance and fidelity all in real time.

Who should own it:  While traditionally the landing zone has been the realm of IT, next-gen data preparation and wrangling tools, such as Trifacta, have made it easy for the business to handle their own data requirements with little IT involvement.

PredictiveTransformation

REFINERY ZONE

Why it’s important: The refinery zone is where minimally processed data with minimal security constraints is used for discovery, exploration, experimentation.

Who should own it: IT has also traditionally owned this zone to transform and standardize raw data that can’t be used as is; but with data wrangling tools such as Trifacta, the business users (primarily the data scientist or data analyst) can use it to explore the data and share datasets for team collaboration.

PRODUCTION ZONE

Why it’s important: The production zone is like the production website: where the business data is stored in a clean, structured format that informs critical business decisions and drives efficient operations. The quality of this data is highly correlated to the data preparation work done in the preceding zones.  

Who should own it:  Here, IT automates the business and data transformation rules to deliver controlled and validate outcomes, but the production zone should meet the needs of its users in the business units, as it’s where most will do their analyses.

Proper data preparation is critical at every stage of the process. We’ve created a helpful summary of the zones and the ownership:

Raw to Refined

The bottom line

The right self-service data preparation tools will foster adoption of the data lake and give it the best chances of success. With next generation tools, your non-technical users can access data in the big data ecosystem quickly, while also augmenting existing data governance policies AND not jeopardizing security or accuracy.  

Help ensure that the business is maximizing the potential of its resources by not relegating everything to technical employees. This way, other IT objectives are not compromised; and business users will be more fully engaged with the data lake, ensuring that the entire organization will reap all of the benefits of the Hadoop data lake.

To learn more about how Trifacta fits into the context of your data lake, download our white paper, “Trifacta Data Wrangling for Hadoop: Accelerating Business Adoption While Ensuring Security & Governance”

Related Posts

Is Your Data Ready for Connected Insurance?

Guest Contributor: Ludovic Veale heads up the Data & Analytics Practice at Charles Taylor InsureTech and... more

  |  June 28, 2018

Data Preparation in an AWS Data Lake

AWS Data Lakes Shift to the Cloud With Help of Data Preparation Considering that data lakes originated to... more

  |  February 5, 2019

Trifacta for Snowflake: Data Prep for your Cloud Data Warehouse & Data Lake – Part 3

In Part 3 of this blog series, we will be looking at how Trifacta helps improve accuracy, speed, and ease... more

  |  September 30, 2019