In this guest blog post, Blue Hill Research analyst James Haight provides his thoughts on the dramatic changes taking place in the information management industry. James recently co-authored a report detailing why MarketShare, a global leader in SaaS-based marketing analytics technology, chose Trifacta as a data wrangling solution to enhance data-driven innovation.
As market observers, we at Blue Hill have seen some big fundamental changes in the use of technology, such as the emergence of Bring Your Own Device, the progression of cloud from suspect technology to enterprise standard, and the assumption of ubiquitous and non-stop social networking and interaction. All of these trends have led to fundamental changes in our assumptions of technology usage and have brought market shifts where traditional players ceded ground to new upstarts or new market entrants.
Based on key market trends that are occurring simultaneously, Blue Hill believes that the tasks of data preparation, cleansing, augmentation, and governance are facing a similar shakeup where the choices that enterprises make will fundamentally change. This key market shift is due to five key trends:
- Formalization of Hadoop as an enterprise technology
- Proliferation of data exchange formats such as JSON and XML
- New users of data management and analytics technology
- Increased need for data quality
- Demand for Best-in-Breed technologies
First, Hadoop has started to make its way into enterprise data warehouses and production environments in meaningful ways. Although the hype of Big Data has existed for several years, the truth is that Hadoop was mainly limited to the largest of data stores back in 2012 and that enterprise environments were spinning up Hadoop instances as proofs of concept. However, as organizations have seen the sheer volume of relevant data requested for business usage increase by an order of magnitude between customer data, partner data, and third-party sources, Hadoop has emerged as a key technology to simply keep pace with the intense demands of the “data-driven enterprise.” This need for volume means that enterprise data strategies must include both the maintenance of existing relational databases and the growth of semi-structured and unstructured data that must be ingested, processed, and made relevant for the business user.
Second, with the rise of APIs data formats such as JSON and XML have become key enterprise data structures to exchange data of all shapes and sizes. As a result, Blue Hill has seen a noted increase in enterprise requests to cleanse and support JSON and other semi-structured data strings within analytic environments. Otherwise, this data remains simply as descriptive information rather than analytic data that can provide holistic and enterprise-wise insights and guidance. To support the likes of JSON and XML without simply taking a manual development approach, enterprise data management requires investment in tools that can quickly contextualize, parse, and summarize these data strings into useful data.
Third, it’s hard to ignore the dramatic success of self-service analysis products such as Tableau and the accompanying shift in user’s relationship with data. Users of data management and analytics technology have spread beyond the realm of IT and are now embedded into the core function of roles within various business groups. Traditional technology vendors must adapt to these shifts in the market by focusing on ease of use, compelling future-facing roadmaps, and customer service. This change is happening already. Consider the nearly $10 billion dollars recently spent to take two traditional market leaders, TIBCO and Informatica, private. With this change, the world of information management will most likely be opened up in unpredictable ways for upstarts that are natively built to support the next generation of data needs.
Fourth, companies are finally realizing that Big Data does not obviate the need for data quality. Several years ago, there was an odd idea that Big Data could stay dirty because the volume was so large that only the “directional” guidance of the data mattered and that the statistical magic of data scientists would fix everything. As Big Data has increasingly become Enterprise Data or just plain old data, companies now find that this is not true and, just as with every other computing asset, garbage in is garbage out. With this realization, companies now have to figure out how to refine Big Data of the past five years from coal to diamonds by providing enterprise-grade accuracy and cleanliness. This requires the ability to cleanse data at scale and to use data cleansing tools that can be used not just by expert developers and data scientists, but by data analysts and standard developers as well.
Finally, the demand for best-in-breed technologies is only increasing with time. One of the most important results from the “there’s an app for that” approach to enterprise mobility is the end user’s increasing demand to instantly access the right tool at the right moment. For a majority of employees, it is not satisfactory to simply provide an enterprise suite or to take a Swiss Army knife approach to a group of technologies. Instead, employees expect to switch back and forth between technologies seamlessly and they don’t care whether their favorite technologies are provided by a single vendor or by a dozen vendors. This expectation for seamless partnership either forces legacy vendors to have a roadmap to make all of their capabilities best-of-breed or to lose market share as companies shift to vendors that provide specific best-in-class capabilities and integrate with other top providers. This expectation is simply a straightforward result of the increasingly competitive and results-driven world that we all live in, where employees desire to be more efficient and to have a better user experience.
As these market pressures and business expectations all create greater demand for better data, Blue Hill expects that the data management industry will undergo massive disruption. In particular, data preparation represents an inordinate percentage of the time spent on overall analysis. Analysts spend the bulk of their workload cleaning, combining, parsing, and otherwise transforming data sets into a digestible input for downstream analysis and insights. As organizations deal with an increasingly complex data environment, the time spent on getting data ready for analysis is expanding and threatening to overwhelm existing resources. In response to these market forces, a new class of solutions have emerged that are focused on the data ‘wrangling’ or ‘transformation’ process. These solutions leverage machine-learning, self-service access, and visual interfaces to simplify and expedite analysts’ ability to work with data even at the largest of scales. Overall there is an opportunity for IT orchestrators to bring this new category of tools into their arsenal of Best-in-Breed solutions.
Companies that are ready to support this oncoming tidal wave of data change will be positioned to support the future of data-driven analysis and change. Those that ignore this inexorable set of trends will end up drowning in their data or losing control of their data as growth outstrips governance.