Start Free

Speed up your data preparation with Designer Cloud powered by Trifacta

Free Sign Up
All Blog Posts

Best-of-Breed Data Cataloging and Data Wrangling: A Match Made in Heaven

June 8, 2017

At Trifacta, we’re focused on data wrangling, or the process of converting diverse, raw, messy data into a workable format for analysis, but also know that wrangling is but one component of the modern analytics ecosystem. As this ecosystem has taken shape—one driven by self-service, which Trifacta helped establish—data catalogs have emerged to solve a significant challenge. Amidst ever-growing repositories of data, data catalogs allow analysts to pinpoint the exact data they need and make sense of where it is in the pipeline, which leads to increased value from and governance of their data.

Data Wrangling + Cataloging = Analytics Success
The data wrangling process is tightly coupled with data catalogues because Trifacta supplies some of the most critical metadata to the catalog. Trifacta allows for users with the best domain knowledge of the data to wrangle it themselves, and their unique insights help curate contextual and relevant metadata. Trifacta automatically generates metadata in-context while the user is executing their work structuring, transforming, and cleansing their data, which ultimately improves the ease-of-use and productivity of users within the product. This work can then pass through to the catalog to benefit other users across the organization.

While there is advantageous overlap between Trifacta and data cataloging tools, we’ve made a conscious effort to partner with best-of-breed data cataloging vendors instead of building our own application-specific catalog solution. Not only has this allowed us to concentrate solely on building a rich data wrangling experience—which, in and of itself, is a thorny problem to solve—but also ensures analysts can focus on establishing a centralized catalog of their enterprise’s data assets. Effective data cataloging requires the ability to tap into every repository and application, and that would be lost if it were application-specific. By leveraging best-of-breed data wrangling and cataloging technologies, users avoid metadata silos, and instead are able to track data lineage through the entire data lifecycle from source repository, wrangling, and analytics.

Customer Benefits
With a best-of-breed approach, our customers receive the best wrangling experience on the market, and also benefit from the ability to leverage an intuitive data catalog to easily discover relevant data sets for whatever project they’re working on. As a result of an integrated ecosystem of tools, these catalogs provide access to extremely valuable context and metadata across the totality of an organization’s data assets to help users better understand the data available to them and how it originated.

This is a widely regarded  best practice for the industry as a whole—across the board, businesses are investing in best-of-breed technologies and open standards, which encourages greater data access across users and applications. Instead of instituting top-down infrastructure management, where data assets are “locked down” and not made widely available to users, IT organizations are developing grassroots approaches to data management that encourage secure self-service across every aspect of the analysis process data wrangling, analysis and cataloging and lean heavily on transparently tracking data lineage across each application or process. This is a win-win for the business as a whole—data is more accessible (and therefore, more valuable) to business users, while IT is able to maintain security and governance in collaboration with their business counterparts.

Commitment to Being Open
Trifacta has open API and bi-directional metadata sharing with a variety of analytics, data catalog and data governance applications. Our resounding belief in open source metadata management has even led Trifacta co-founder and UC Berkeley professor Joe Hellerstein to invest in building a vendor-neutral metadata services layer, Ground, a platform for metadata storage that will sit underneath catalog applications and other metadata value-add services. In the short twelve months that Ground has been active, it has gained significant interest from Cloudera, Capital One, and LinkedIn, among others, and we’re excited to see it grow as an important addition to the community.

The fast momentum of Ground is another leading indicator that the data management landscape is evolving very quickly, and we’re confident that Trifacta is prepared to adapt and integrate with new innovation.

To learn more about how Trifacta was built to integrate with data cataloging tools, please visit our architecture page.