Start Free

Speed up your data preparation with Designer Cloud powered by Trifacta

Free Sign Up
All Blog Posts

Leveraging On-Cluster Visualization for Faster Insights

December 7, 2016

We’ve talked a lot about the data lake ecosystem on our blog to help organizations accelerate adoption and spearhead new data lake-driven initiatives. In this post, we expand upon best-of-breed data lake architecture by explaining how modern BI and visualization tools have adapted to support the data lake concept.

Let’s assume that you’ve managed to ingest data in your lake, have put together a strong governance layer to organize your data, and are using Trifacta to allow users the ability to explore and prepare that data themselves. Now, you’re able to construct the visual dashboards and reports needed to drive insight for your business.


When it comes to analytics, enabling ease-of-use and a quick turn-around time are vital, yet the growing volume and variety of data that users work with has the potential to inhibit a smooth analytic delivery. On top of that, increased user demand within the organization means that there is more pressure for analysts to get it right.

We find that organizations are better off adopting a modern process—instead of bringing the data (or, in most cases, only a portion of the data) to analysts, analysts should come to the data. Trifacta partner, Arcadia Data, does just that, bringing business intelligence to the end users on top of Hadoop. In this post, we’ll take a closer look at how they do it.

One trusted place to explore and share

Traditional BI and visualization tools usually work for smaller data sets, but when it comes to large volumes of data in Hadoop and other big data technologies, they may have limitations on scale and other enterprise capabilities.

To solve for this, the traditional approach to BI has been to create many data marts or enterprise data warehouses for different granularities and aggregations, in turn causing the systems to become disconnected and siloed. Odds are, you’ve experienced the consequences of this—both you and your colleague come prepared to a meeting with different values for the same measure, and it’s back to the drawing board. These siloes make the whole system difficult to maintain, slow, and, worse yet, produce inconsistent data.

Only by storing data in one place can organizations dismantle these siloes. Hadoop solves the data storage and processing issues, but modern visualization must be able to keep up. It should be able to scale and leverage the flexibility that Hadoop provides with dynamic and constantly evolving schemas, while also enabling users to report different granularities and share with consistent outcome.

Arcadia Data provides the ease and convenience of a web based visualization solution at scale with sharing enabled across users. On top of that, Arcadia Data also allows users to visualize real-time streaming data and non-Hadoop data, all in one simple user interface.

On cluster architecture

Like Trifacta, Arcadia Data has pure, on-cluster architecture in order to empower users to directly access data in Hadoop without copying it into a proprietary format. Better yet, it allows non-technical users to do this work—with a simple, web-based visual interface, users can make sense of complex data without it having to leave the cluster.

Arcadia’s Hadoop Native BI platform sits on every node of the distributed Hadoop cluster—right where the data lives—which eliminates the need for any additional hardware or complex software technologies and servers that legacy BI tools generally require. With this converged architecture, Arcadia Data:

  • Eliminates the need and cost of data warehouses, data marts, cubing engines, & BI servers
  • Significantly enhances performance of analytical queries
  • Mitigates security risks, data leakage, and fragmented security inherent in moving data
  • Provides direct access to 100% of your Hadoop data (vs. extracting only smaller bits)

Leveraging Hadoop Native Security

Since Arcadia Data is situated directly on cluster, its visualization tier complies with the security and role-based data access protocols defined at the raw data-level within Hadoop. This is the same approach that Trifacta has taken from our early beginnings by leveraging native Hadoop functions. Being native to Hadoop enables users to inherit these critical features, regardless of what the Hadoop distribution is.

Arcadia Data + Trifacta

Arcadia Data combines direct access to data in Hadoop with powerful and self-service flexible visual analytics, while Trifacta has the same guiding principles for data preparation. The combined solution of these two best-of-breed technologies, both engineered with Hadoop in mind, has seen proven success in our joint customers, such as Kaiser, Nordea, Marketshare, or Teliasonera. Our customers have significantly reduced time, resources, and cost from ingestion to data insight delivery to the end-users with a best-of-breed data lake solution.