Today, we are excited to formally announce our collaboration with IBM and the launch of a new jointly developed data preparation tool – IBM InfoSphere Advanced Data Preparation. Although this is the first time we’re speaking about the partnership publicly, today’s release is the result of tireless work across both Trifacta and IBM over the past several months to bring this new product to market.
As the company that created the data preparation category, this collaboration is not only an incredible milestone for our team at Trifacta but also an incredible milestone for the data prep market. When “Big Blue” decides to make a serious commitment to an area of technology, it signals that there is tremendous market opportunity both right now and in the future. It has been energizing to hear IBM’s perspective on how pervasive they see the data preparation bottleneck in the data lake and data warehouse environments of their customers.
IBM shares our vision that an increasing number of organizations are adopting DataOps practices in order to stay competitive. The tools, methodologies and organizational structures that businesses utilize for data management must evolve to improve the velocity, quality and reliability of analytics. As part of this evolution, we see the following trends playing out in the market and a big reason why IBM and Trifacta partnered to bring this new data prep solution to market.
Data is the Differentiator for AI
Every organization is investing in machine learning and AI across their business and rightfully so. AI has the potential to fundamentally disrupt markets and transform businesses. However, the majority of machine learning projects leverage the same algorithms. Whether you’re utilizing TensorFlow, R, Python or some other ML framework, the majority of the algorithms are open source and available to anyone. What differentiates AI initiatives is the data that feeds these algorithms. Clean data is table stakes to have any success with machine learning. But if you want to outpace your competition, you need differentiated data. As Megan Beck and Barry Libert eloquently state in their Harvard Business Review piece – “So, while there is a visible arms race as companies bring on machine learning coders and kick off AI initiatives, there is also a behind-the-scenes, panicked race for new and different data.” As part of this collaboration with IBM, we’re excited to help organizations use IBM InfoSphere Advanced Data Preparation to both clean data and create new differentiated data sets for the AI initiatives they’re executing in Watson.
Symbiosis of Data Prep & Data Cataloging
The rise of data catalogs and data preparation have been intertwined over the past few years. Anyone who regularly works with data as part of their job can speak to the challenges of finding the right data for the particular task they’re trying to perform. Once you find the data, getting that data clean and into the right format for your analytics work can be an even more painful task using traditional tools like excel or hand-code. Even worse, the process of cleaning up the data in a data preparation tool creates a new derivative dataset that needs to be named, tagged and accessible for the broader organization. The modern analytics workflow requires both making the process of finding data easier but also the process of cleaning it up for your particular use case more efficient. This exact problem is why bringing our joint data preparation solution together with the IBM Watson data catalog will help customers not only find data but also wrangle it for use in analytics.
Modern DataOps Platforms Require ETL & Data Prep
We’ll never rid the world of one-off analyses being performed in spreadsheets or desktop-based tools but every organization is trying to. In order for data to be trusted, the data prep and analytics process must be governed. This is why organizations are investing in modern data lakes and data warehouses to provide a centrally managed environment that blends self-service access for business users with governance from IT. In order to accomplish this goal, businesses are finding they need to utilize both ETL to move diverse data sources to the lake or warehouse and then data preparation to refine that data for analytics or machine learning. With IBM DataStage and Infosphere Advanced Data Preparation, organizations will be able to do exactly this and ensure the successful adoption of these platforms in the process.
As you can see, there is a lot of synergy with what’s happening in the market and this new collaboration with IBM. The response in early customer conversations has been tremendous and we’re excited to watch our relationship grow and evolve as this market continues to do the same.