The Big Data Tidal Wave

By Tony Agresta

Technology to store, manage, search and analyze Big Data leaps to the top of the agenda for Financial Institutions as enterprise NoSQL databases come of age.

Financial Institutions are focused on initiatives to survive in a world where regulatory pressure, risk mitigation and increasing volumes of data continue to pressure legacy infrastructures. Improved operational efficiency and revenue generation are at the forefront of the agenda.

Specific areas of concentration vary across regions of the world. Some common strategic initiatives in 2012 and 2013 include:

• Infrastructure Improvements – Current IT infrastructure needs an overhaul if financial institutions are able to respond to increasing regulatory pressure, new product innovation and huge volumes of complex data. By-products of this change include but are not limited to improved data analysis, reporting, data visualization, business intelligence and predictive analytics.
Data Proliferation – The growth in mobile applications and social media present unique challenges. As customer touch points expand, new sources of data with new forms of complexity proliferate. Initial approaches to data storage have resulted in silos leading banks to consider innovative approaches to data consolidation.
Operational Efficiency – Eroding profit margins, a reliance on legacy systems that can’t handle the load and the need for real time applications have led financial institutions to focus on operational efficiency. As they make improvements in this area, resources are being shifted to revenue generating projects.
Data Security and Scalability – In the quest for security and scalability, CIOs and IT leaders are focusing on technology deployments where reliability and uptime are paramount. In turn, high performance transactional environments that utilize all forms of data will allow financial institutions to compete effectively against more nimble players.

In 2010, 1,000 organizations were storing one petabyte or more of data. By 2020, the number of organizations storing one or more petabytes will skyrocket to 100,000.

The Data Revolution

While this sea change takes place, a data tidal wave has saturated current systems. Characterized by monumental volumes of complex data flooding in from every channel, the wave continues to impact existing infrastructure. Research documents, real time market data, sensor data, geospatial data, social media, video, mobile transactions – the complete digital universe presents major challenges for existing systems and legacy databases. Organizations are scrambling in search of solutions to harness complex, unstructured data that traditional relational database technology was never designed to handle.

Consider the pace at which this “Big Data Tidal Wave” is progressing. The research firm IDC recently noted that by 2020, the need for virtual and physical servers will rise by a factor of 10. With 50 times the amount of data and 75 times more files, IT will need to dramatically expand its administrative capabilities. In 2010, 1,000 organizations were storing one petabyte or more of data. In less than 8 years, by 2020, the number of organizations storing one or more petabytes will skyrocket to 100,000. A significant portion of this data is unstructured requiring new processes and technologies to transform raw data into actionable intelligence.

Recent estimates from IDC, McKinsey and the Bureau of Labor Statistics report that financial services companies with 1,000+ employees (including banking, securities and investments) lead all industry sectors in terms of data volume, with 5,800 terabytes of data stored on average. The next closest industry is Media and Communication with 1,800 terabytes – still a large number but 3 times less.

Historically, financial services firms led the pack when it came to capturing and applying transactional data. Today, they realize that highly valuable unstructured data can be used in conjunction with structured information to improve business processes.

A New Approach to the Problem

Leveraging all the data in support of smarter decisions will drive business growth. But do financial institutors have the right infrastructure in place to meet the challenge? Traditional tools, databases and systems were not architected to handle massively complex data streaming from new channels and touch points.

At the root of the problem is the fact that relational tools are inherently fragile requiring IT to constantly build and maintain complex relational schemas. New financial products that include social and mobile data, research documents and other forms of unstructured data cannot be supported on this older technology. Many of the new applications needed to compete in a dynamic market require real time operational updates. To meet these challenges, financial institutions require an agile approach with a flexible technology infrastructure allowing IT to add data sources and real time access without modifications to data schemas.

The winners are implementing solutions to store, manage, search and analyze any number of data sources. In many cases, these solutions solve mission critical operational problems allowing companies to create new products, react immediately to consumer demand, anticipate customer service requirements, identify risk and predict trading outcomes. At the core of these solutions is a powerful mix of information intelligence, real-time access, alerting and data enrichment.

What’s the end game? In some cases, it’s complex data that can be searched instantly with results presented in charts, tables, graphs and timelines to report on changing market conditions, portfolios and risk models. In other cases, it’s transactional systems for derivatives trading where millions of read-write change requests are applied in a workflow environment. Other financial institutions need to support hundreds of research analysts focused on equities or fixed income research. They need to search many sources of data in dozens of formats to find relevant research, analyze results and reuse the assets. Many of these applications have a common thread – applying real time insights driven by a variety of data stored across the enterprise.

In most cases there’s no need to “rip and replace” existing database technology to achieve these goals. While relational database systems don’t have the ability to handle complex unstructured data, they are well suited for managing information that fits in rows and columns. Newer approaches that utilize an enterprise-class NoSQL database allow companies to store and manage unstructured information. These systems can co-exist with existing systems and databases while breaking down silos simultaneously. Transaction data can be used in conjunction with unstructured data as users iterate on the data to build applications. As data evolves, the database evolves with it.

Big Data Success – A Single Source of the Truth

Fortunately, solutions do exist. They allow for improved risk management, enhanced regulatory reporting, smart analytics and a shift toward transparency. Financial institutions can leverage all of their data in a secure, reliable environment where new sources of data come on line frequently. They provide for agile product innovation in weeks, not months, driving higher levels of customer satisfaction and revenue.

Today, advances in technology have allowed financial institutors to scale-out on commodity hardware, a benefit supporting large scale expansion as data volumes and data complexity increase. Real time access to mission critical data coupled with alerting allows banks, brokerage and investment organizations to react immediately to market fluctuations while reducing portfolio risk. Full text search and the ability for IT to add new data easily provide great promise as financial institutions reinvent their business processes.

Consider the following customer case study; one of the world’s top derivative trading banks made a strategic decision to review its global technology infrastructure to ensure cost and operational efficiency. One major finding of this initiative was recognition of significant risk exposure the Bank was assuming in its derivatives trading operations. Existing systems were based on different relational databases scattered across servers around the globe and were unable to provide the Bank with a real-time global view of its overall position. The current systems would not be able to handle the projected growth in data or meet the need for real time risk monitoring. In addition, the bank was allocating valuable resources to maintenance and risk reporting – resources that could be applied to revenue generation. Management recognized it had to seek alternative solutions and a project team was quickly organized.

MarkLogic’s solution allowed the bank to consolidate all systems into one while lowering the cost per trade…the bank became well positioned to handle additional compliance changes should they come in the future.

While both internal and external solutions were typically part of the review process for new systems, the Bank’s culture was such that the winning system was always the in-house solution. The project team concentrated on both cost and performance based comparisons, with the latter focused on scalability and the demand for 24 x 7 availability. As the review process progressed, it became obvious that only one system could meet all the requirements the project team had established, as well as the agility to handle future, unknown requirements.

The solution was a MarkLogic single operational trade store which eliminated over 20 systems scattered across the world enabling all derivatives to be housed in one location. This system was architected to also handle the Bank’s securities processing, which has significantly different characteristics and demands than highly complex data associated with derivatives trades. Securities processing requires the ability to handle high volumes of transactions with low data complexity. The contrasting nature of the different asset classes meant that developing a platform that could serve both was a huge challenge technically but also culturally as each group was accustomed to managing their own systems.

Mergers and acquisitions had generated a large volume of related data which needed to be migrated as part of the core platform. This solution allowed the bank to consolidate all systems into one while lowering the cost per trade. The complexity associated with regulatory reporting was dramatically reduced and the bank became well positioned to handle additional compliance changes should they come in the future.

The business value to the Bank exceeded expectations. The total cost of ownership of its processing platforms was reduced. Cost per trade was reduced. A single copy of transaction data for its global portfolio can now be accessed 24 x 7 with no downtime for maintenance. The complexity associated with regulatory reporting was dramatically reduced and the bank is well positioned to handle additional compliance requirements in the future.

This very same platform could be used for other operational systems across bank divisions in the future. Today, the resources saved have been applied to new revenue generating projects.

Essential Attributes in an Enterprise NoSQL Database

What’s at the core of a big data technology solution like this? At the core is an enterprise NoSQL database that is powerful, accessible and trusted. Success depends on a lot of things including reliability, scalability and security to handle mission critical data.

Different stakeholders have different requirements. Business leaders need applications that provide value, allowing them to compete against new players and financial giants. For example, real time, multi-lingual full text search powers applications that deliver complete, accurate results. Alerting, event processing, analytic functions and packaged integration with open source enhance these applications.

Information technology professionals have a different set of requirements including the need to leverage existing tools and resources. APIs, integration with business intelligence, user interfaces to build applications and connect to Hadoop – all of these allow IT and developers to apply existing knowledge and experience in support of shorter deployment lifecycles and reduced maintenance. With monitoring, management and support across operating systems, enterprise NoSQL databases have the flexibility that less robust, newer technologies lack. Access to existing data sources means that applications can be quickly and easily created using tools already in place.

Recent growth in open source has led some businesses in the direction of free databases as they attempt to cut costs. But is an open-source, NoSQL approach the answer when security, reliability, scalability and other features are required to solve the problem?

In addition to the attributes mentioned above, buyers looking for technology to handle a variety of data formats and data types, including unstructured data, should consider replication capabilities, point-in-time recovery and automated failover. Database rollback, distributed transactional processing, backup/restore and role-based security are required for mission critical big data solutions. The reality is that many databases have compromised what is known as the ACID properties important in transactions processing – atomicity, consistency, isolation and durability. Enterprise approaches to the problem have these characteristics already built into the operational database.

Stepping outside pure technical requirements, the fact that the enterprise NoSQL database is also “search oriented” provides a distinctive quality, allowing for rapid search against petabytes of information. To do this effectively, you need a solution that indexes data upon ingestion allowing users to gain access to all of the data all of the time. Armed with these indices, users search both unstructured text and structured simultaneously. Because this class of technology is “schema agnostic,” it can ingest data from a wide variety of sources and query this data together in real-time.

Unlike open source and 1.0 technologies, true enterprise NoSQL databases are reliable, scalable and proven to work across industries, markets and hundreds of customers.


It is essential that both public and private sectors embrace new approaches to databases given the reality of enormous data volumes and workloads that will not slow anytime soon. The database market has changed, driven by advances in hardware technology and the availability of enterprise NoSQL databases. A specialized approach to handle complex unstructured data at scale is one of the advantages of this technology. Full text search, real time data ingestion, alerting, data enrichment, the ability to leverage one’s existing tools have all allowed IT departments and developers to easily deploy mission-critical big data applications worldwide. Unlike open source and 1.0 technologies just emerging on the market, true enterprise NoSQL databases are reliable, scalable and proven to work across industries, markets and hundreds of customers. This approach represents a viable big data platform to address the challenges faced by the big data tidal wave.

To learn more about how organizations are deploying this technology, including a comprehensive perspective on how database technology has evolved, visit your complimentary copy of The Bloor Group’s white paper “The Database Revolution.” For additional information on MarkLogic, visit

About the author

Tony Agresta is vice president of worldwide field operations for MarkLogic. He has spent over 25 years in the packaged software and data markets in support of sales, marketing and product strategy.

About MarkLogic

MarkLogic is an enterprise software company powering over 500 of the world’s most critical Big Data Applications with the first operational database technology capable of handling any data, at any volume, in any structure.

The views expressed in this article are those of the authors and do not necessarily reflect the views or policies of The World Financial Review.