Not every data problem is a ‘Big’ Data problem
Both regulators and senior management expect banks to get more from their data. The range of information available from both internal and external environments has grown dramatically in recent years, presenting unprecedented opportunities for quantitative risk assessment and decision-making. Yet at the same time, banks are struggling both to manage expanding data volumes, and to derive meaningful insight from increasingly “noisy” information. Data management, once viewed as purely an IT issue, is now increasingly recognised as an area in which effective solutions depend on close collaboration between business and technology communities.
These challenges are driving significant interest in the use of “Big Data” in financial services. By dividing data storage and query processing into chunks that can run on cheap, scalable hardware, Big Data solutions open up previously unaffordable opportunities to mine vast volumes of raw data.
The prospect of next-generation analytics capabilities, with minimal infrastructure costs, makes for a compelling proposition – one which software vendors have seized upon with a growing number of Big Data platforms looking to enter the banking sector.
Whilst the interest is clearly fuelled by the prospect of a competitive advantage in a cost-pressured environment,
banks should be prepared to challenge industry hype when making specific investment decisions. Big Data is not a generic “silver bullet” to address any data challenge; nor is it a one-size-fits-all solution, where the same approach works in any situation. The image of a bank’s entire data set flowing into a massive data lake, from which it can be used across any business for any purpose, makes for a compelling sales pitch – but it oversimplifies the business reality.
Consider the commonly-cited ‘3 Vs’ of Big Data – Volume, Velocity and Variety. The Big Data premise takes these as three demands to be met in parallel – i.e. to process high data volumes (> 1 petabyte) at high speed, whilst handling information in a wide variety of structured and unstructured forms. This is not always case. There are indeed legitimate uses for “high variety” data in banking – the main ones cited being web-driven customer analytics for sales support, product development and credit risk profiling. However, there remain far more use cases for low-variety data – data which is uniform in structure, with each fact being easy to capture, substantiate and transform via standard, repeatable processes. This is most clearly evident in regulatory reporting: regulators want consistency and transparency in how their returns are sourced and prepared, and are unlikely to be satisfied by explanations that “it all just comes out of the data lake…”
We can therefore expect that in the near term, the majority of successful use cases will feature more of a hybrid approach. Mid-to-large volume data lakes can be built to leverage distributed storage and parallel processing, but applied to data models which are more traditionally structured at the logical level, based on pre-defined data requirements. Whilst it may not meet a purist definition of “Big” data, this may offer Big Data vendors a foothold from which to build up to more fully-fledged applications.
Even then, banks shouldn’t forget to ask the question, “Do I need a ‘Big’ Data solution at all?” It’s not uncommon for direction on the target solution to be set by the IT community before data requirements are well understood, and we may yet see examples where the Big Data route delivers poor return on investment compared to more traditional options. Navigating this decision requires both the owners of the business need, and the IT community supporting them, to share a clear understanding on the problem to be solved.
Big Data is already regarded as a significant tool in banks’ technology arsenal – but it will be this partnership between the business and technology that determines where it’s able to add real value.