Walking the halls of financial services institutions for the last 8 years as a consultant in data strategy has given me an opportunity to participate in a multitude of data projects. The spectrum ranges from standard data warehousing projects to those focused on creating a 360 degree view of the customer. More recently, CCAR and Fraud reporting utilizing diverse Big Data sources are becoming increasingly popular in these institutions.
A couple of years ago, a realization hit many of the clients I had been working with. It was that the current enterprise data warehouse infrastructure was not going to handle the strain of maintaining vast amounts of data for both the current and future demands of financial regulation in any real cost effective way. The warehouse was simply going to become too big and too costly to grow and service these growing requirements. A change needed to be made.
All the while in various business units across these financial service institutions, business analysts needed to incorporate new sets of data into their analytical and reporting models that had never been previously leveraged. Data making up these analytic projects was becoming more diverse in both content and scope. This meant that those who knew the data best needed to adopt some rather costly and inefficient practices to produce the analytics required to satisfy both regulatory and organizational needs.
About a year ago, I was instructing a leading financial services group on how to manipulate and clean some particularly nasty datasets for a Fraud Compliance and Reporting data warehouse project. These folks knew their organization’s business data inside and out. However, they were receiving some new sets of data that needed to be incorporated into their analyzes from a group of 3rd party institutions. Fortuitously, at lunch, an executive emailed an analyst asking for some of the new “blended” data we were discussing for a report to be presented at the end of the day. This information would need to come from a blend of this new data all of us involved knew was not well understood and a real headache to incorporate.
We thought this would be the perfect chance to try a new tool we were working on with a real use case. However, the reply back from one the individuals involved in the project was, “No, I’ll just spin up the data in Teradata and take care of this in SAS. It will be quicker.”
Quicker apparently meant this particular individual would leave the session at noon and not return to the session as it took this person 4 ½ hours to complete the task in that environment.
So that night at dinner I spoke with the director in charge of the analytics team, and she told me that the preparation of the datasets they had to use for analytics and statistics coming from all over the company kept her up at night. Her team was simply spending far too much time in the preparation phase, and not so much in the analytic and statistical phases where they needed to show value. And this wasn’t all. An additional burden was on her team from a cost perspective because each time the analyst would simply “spin-up” a Teradata dataset to solve the issue, it was usually abandoned after the task; not re-used or re-purposed and left on the Teradata instance. This had massive storage and cost implications for her unit that could not be sustained. She also recognized that there needed to be another way.
I have seen this story echoed in so many different institutions I have spoken with over my career. This led me join Trifacta.
I saw the value in the ability to grab a raw dataset and “wrangle” it using a new visual paradigm that incorporates Predictive Transformation to guide the user through the process and get the job done in a fraction of the time it takes a team to build out data integration mappings. Couple this with the ability to generate new sets of information from that raw data and store it all in a Hadoop data hub like CDH, gives these teams a far more more flexible and cost effective way to innovate with their data and alleviate the data warehouse appliance storage glut at the same time. It’s a win-win.