Though “big data” has been a buzzword for quite some time, it’s easy to forget that the industry is still relatively new with plenty of potential to grow—IDC predicts that big data and analytics sales will reach $187 billion by 2019. Fueling this growth are the forward-thinking data professionals that invest in new technologies, hone best practices, and drive new initiatives. In this series, we’ll sit down with a few of them for a unique perspective on their field. To start, we’ve asked William Foley, a big data professional based in Spain, to answer a few questions.
William Foley has extensive experience leading big data, digitization, and transformation projects within large multinational firms, mainly in financial services. His last role was as Chief Data Officer of a large multinational bank leading an innovative big data initiative.
1. Explain a little bit about how you approach big data as a Chief Data Officer.
As Chief Data Officer, my role is to look at the strategy of how data is managed. I play for both sides of the fence, or help glue together the business and technical sides. In my opinion, big data cannot be implemented successfully without the active participation of all parties.
When I first look at big data, the key concern is how to make sure I don’t lose control of what, when and where we had each piece of data. It’s critical that data is anchored down at the beginning of its cycle, in addition to having each stage that it passes through clearly defined and adhered to. The traffic needs to be managed both in terms of quality and strategy, which means that an understanding both from the business and technical perspective is required. From my perspective, you need tools to help in this understanding and organizing (wrangling) of the data, but also robust repositories that allow you to control where data is and securitize it.
In terms of the data strategy, my role also requires crucial decision-making. For example, deciding whether we begin to store and build up a history for a piece of data that we think could be valuable to the business, which previous storage cost and lack of tools wouldn’t justify.
2. How have you seen the big data space evolve since you started working in the financial services industry?
The industry is more and more conscious of what big data can be used for and how it can help. Take traditional CRM, for example, where leveraging larger amounts of a greater variety of historical data can help generate new and more accurate insights. Another example is the greater ease with which technology allows organizations to work with “fast-data”, which adds enormous economic value.
What’s enabled this to happen, of course, has been the enormous advances in software available, in addition to cheaper and more flexible infrastructure capacity and the progression of methodologies applied in analytics. There’s been a lot of work and development around tools making it easier for users to work with and understand data, which has helped its democratization. In addition, the availability of solutions such as Trifacta, and the fast pace of development in Hadoop and other technologies allow for better control of what’s going on in terms of the end-to-end data flow, necessary not only for regulators (lineage/traceability), but to successfully scale out from a business perspective (what data I have and the state it is in).
3. What use-cases are most popular in the financial services industry, and what do you see similar organizations taking on in the future?
Today, most financial services use-cases are around micro segmentation, mapping relationships or networks, fraud detection, enhanced risk decisioning models and efficiency initiatives, such as hardware monitoring. I think going forward there will be an ever greater use of machine learning and predictive analytics to anticipate customer needs, much like Amazon and similar companies are doing today. That work can be applied to contact centers, online marketing, financial markets using more sophisticated prediction models to time movements, or micro manage treasury positions. In general, there’s a wealth of data and a huge number of scenarios to apply new technologies and techniques.
4. What types of data do you work with, and what is challenging about it? What types of data do you see rising in popularity in the future?
I have worked a lot with traditional structured data, however working with big data has meant that I’ve also indulged in new data types, especially unstructured ones. For example, working from a business perspective with technical logs. There are two key challenges with data of this nature: first, is understanding the data on a technical level, as well as a business level. Second, is getting it into a workable format, which is where a tool like Trifacta comes in.
Within the financial services industry, I see social networks rising in popularity—banks are starting to use them more and more as a channel to engage with their customers. In branches we will start to see sensor data (IoT). Also, as biometric security evolves, the industry will also work more with very unstructured data like images — for example, some banks are already using images you capture on a smartphone to set up an account and identify yourself.
5. Do you see the onset of new types of data leading to the creation of new analytics initiatives, or do you see the desire to develop new data-driven insights lead to the sourcing of new types of data?
I think that it will always be a mix of the two for any business. It’s far easier than ever to leverage all available data sets than in the past, and the available technology also makes it easier to work in a “laboratory” environment where requirements to source new data will arise as hypotheses and models are developed. Regardless, I think that the key evolution in data is to look at it from an R&D perspective. Given that it is much easier technically to ingest new data sets, and the tools are becoming more and more user-friendly to discover and understand the data, I think this flexibility will be a key lever for any digital transformation.
6. What has your Hadoop journey been like?
In terms of my experience, there are two available routes for a Hadoop journey: one, construct on a use-case basis or, two, prepare a data lake with all available data sets. Each has shown to have its pros and cons. For example, a complete data lake enables data scientists to investigate and test new models with the data available rather than being pre-conditioned; in other words, a laboratory scenario. I have seen extremely interesting results achieved. On the flip side, however, this approach requires you to think carefully about your end strategy and build flexibility to adapt along the way. It also requires you to review and take decisions around the entire data management strategy. The use-case driven strategy is also valid, serving perhaps more to “test the water” prior to launching a full big data project.
In addition, tackling a big data project with Hadoop today is much easier than 24 months ago given the advances in security, lineage and audit of data, as well as the ever-growing number of user-friendly tools such as Trifacta, which integrate natively with this technology and enable both democratization of data, as well as the ever important end-to-end governance.
7. What has your relationship been like with IT? How do you prioritize, communicate, delegate, work efficiently, etc.?
For me, the relationship between IT and the rest of areas has changed dramatically. These sorts of projects require very close relationships in order to be successful, which clearly breach methods applied in the past in almost any organization, financial services or otherwise. I have led a big data project applying agile methodologies with extremely good results, both in terms of time to market and quality.
There’s a lot to be said for having a team from a variety of departments and with varying skills working together in the same project room. Communication is clearly much swifter with less risk of “interpretations,” but more importantly, business, data specialists, and IT are obligated to empathize with each other’s challenges. The tools available, coupled with daily scrum meetings from my experience, make it much easier to delegate work and let each individual team member contribute their knowledge and expertise far more positively. Having everybody in the same room also means quick decisions can be taken when challenges arise and, above all, there is absolute transparency on all sides.
8. Based on your experience, how do you see modern data wrangling adding value?
The first clear benefit I’ve seen through experience of modern data wrangling is the capability to take, for example, a weblog that’s a whole string of data and have a non technical data analyst easily dissect and structure it. This alleviates the data scientists from having to structure and organize that data and allows them to focus on the analytics. It also alleviates the workload for the technical team, given that tools being developed are native to new technologies like Hadoop—if this wasn’t the case, IT would need to program each step of the dissection and structuring. User interfaces are much easier as well, meaning that somebody not so familiar with SQL or programming for example can do huge amounts of work based on drag and drop interactions.
For me, the biggest challenge in big data is understanding the data as it comes in. As soon as a new piece is landed in Hadoop, a tool like Trifacta can be used to understand and structure it — there’s a business perspective required, not just a technical perspective. Trifacta is key to providing that.