Connecting to data is a fundamental feature of all data tools, which is why we’ve increased the range of connectivity options in our latest release. Starting with support for cloud-based systems like Amazon Redshift and S3 & Microsoft Azure Storage Blob, to more recent support for common relational systems, such as Oracle, Teradata, and SQL Server, we’ve been able to meet our customers’ diverse needs while deepening our support of HDFS and Hive / Impala.
But in the world of self-service data preparation, accessible data goes beyond having a long list of connectors; it’s just as important to empower nontechnical end-users with the ability to access it themselves. We’ve seen this with customers time and time again: the most common pain-point they encounter is not they couldn’t access the data they needed, but, in the wake of various technical barriers, that they couldn’t access it in a timely manner.
With our continued focus on the end-user experience, we’ve also built our connectors along the same vein. Among all of our connectivity options, we ensure simple and immediate access to live data to deliver the first truly self-service data connectivity.
Reality of Traditional Data Connectivity
Imagine a typical scenario of an analyst trying to access data: the analyst goes about it in an ad hoc manner because it relied heavily on the context of the task at hand, which is then challenging for IT to respond to quickly. A request that should take minutes turns into days because IT needs to requisition the data and make it available. Not only that, but the end deliverable is often not what the the analyst wanted because, without the full context of the request, the IT team couldn’t complete it accurately and the whole process starts all over again.
What’s the reason for scenarios like these?
- The data you are interested may be scattered across multiple files or tables, and it takes time (and multiple tries) for someone without context to figure out all the tables that need to be combined together.
- Moving large amounts of data takes time to transfer and additional space to store, which means IT needs additional justification to allocate the time and space.
Both of these issues are compounded when you don’t know exactly what you are looking for, so you need to iterate through multiple sets of data to figure out what works best for your tasks.
Trifacta: Self-Service Connectivity
Trifacta addresses these two challenges by:
- Allowing end-users the ability to browse and preview all the data in a source system in a simple user friendly interface that doesn’t involve any technical syntax like SQL. This enables users to quickly iterate through several files and tables to figure out which data you need on your own, all of which happens without launching another tool or relying on another team’s help.
- Enabling live access to the data, not a cached copy that needs to be refreshed. his means data doesn’t need to be first copied to another location before users can start to understand its usefulness—they can explore it immediately without consuming additional storage space.
Both of these are achieved in Trifacta without overloading the end system with lengthy data loads and still respecting the security rules put in place by IT.
Leveraging Cloud, Relational & Hadoop Data in Trifacta
In Trifacta, connecting to data is the first step in your wrangling journey. To make it as simple and responsive as possible, we have a single interface that enables users to browse between all their data sources, whether it’s Hive on Hadoop, Files on S3 or relational systems such as Oracle and Teradata. Every user can browse the end systems, preview their contents, and create a dataset where the user can immediately start wrangling, all in the matter of seconds.
Using Trifacta’s automatic type inference and parsing or Trifacta’s full set of structuring transformations, it’s easy for users to combine data across all these different data sources within Trifacta. For example, users can cleanse XML data stored in cloud file systems using relational reference data or blend Hive & SQL Server tables
Trifacta’s solution provides users with immediate access to the data for wrangling so that it is easy to maintain context of the task at hand. In cases where users need access to more or additional data, users can easily browse and start working with new data without any additional involvement from IT. Once users find the data you want, and have wrangled it enough to know that it meets your needs, then they can confidently work operationalize the end to end flow.