Exploring The Cost of Dirty Data in Banking


By Michelle Knight

Good people with good data continue to revolutionize banking, providing customers with better payment experiences. While trustworthy data promises better all-around banking transactions, the cost of dirty data, insufficient and inaccurate information remains a substantial threat, not only to the financial institution responsible for it, at a greater macroeconomic level.

The consequences of banking’s dirty data mean a loss of 15% to 25% of revenue, according to the MIT Sloan Management Review (MIT SMR). At a national level, dirty data has contributed to $112 billion of fraudulent mortgage loans in the United States between 2005 and 2007 and class action lawsuits in 2020. Due to dirty data problems, fiscal damages become easily overshadowed by consumers and banking loss of confidence in available financial data.

So, what is dirty data in banking? Where does it come from? Why do good people end up making poor decisions with dirty data in banking?

Dirty Data Comes to the Forefront

Dirty data, inadequate data that breaks financial processes, stands out prominently when another crisis occurs. A bank finds itself in the middle of an economic meltdown or a pandemic economy.

The institution assesses the damage. Good people take steps to remediate the issue, including getting in touch with their customers to inform them of their options, respond to their queries, and sort out their assets and debts.

Then these good bankers cannot communicate well with their customers because of the bad contact information or figure out how to advise customers because they lack confidence about what the customer owes. These headaches happen to duplicate, inconsistent, and incorrect information, dirty data spread through many bank systems.

Example of Dirty Data in Banking

The global financial meltdown in 2008 provides a striking example of the cost of dirty data in banking. Sub-prime mortgage investors could purchase a loan pool based on a lender’s Microsoft Excel spreadsheets. But the lack of standardization in creating, using, and testing this spreadsheet data-led good bankers to take on unacceptable risks.

Unaware of these dirty data banking problems, bankers used the information to sell and price loans, in addition to setting aside reserves for potential loss. In sampling a data set of 1,000 loans with 150 fields, TowerGroup found missing 5-10% of critical data used to analyze credit and price loans.

Good people used this bad banking data, made poor decisions, and exacerbated customer consequences from the financial meltdown. The impact of dirty data was so huge that the United States and European governments had to intervene.

The Origin of Dirty Data in Banking

The problems of dirty data stem from banks adapting to contextual shifts and technical advancements. Before the financial crisis of 2008, regulatory oversight and auditing became less stringent. Meanwhile, advances in information technology made it easier to track and communicate financial information through spreadsheets, email, and the internet.

Despite the evolution of financial services, banks continued to handle data quality policies and procedures, as they have done. Overlapping customer information existed and stored among different banking departments –the same person may have a checking account and a mortgage, interacting with different banking departments. Banks continued to manually enter and transform data, increasing the likelihood of dirty data in any banking system.

Meanwhile, data was created and used for the same customer entities, but their information did not match the systems – their name might appear one way in one system and another way in a different one. All these factors exacerbated the cost of dirty data in banking during the financial meltdown in 2008.

The Soiling of Clean Data

The combination of regulations to address the 2008 financial crisis and increasing competition with technology companies, and consumer demand for online banking services has led banks to clean up dirty data. Banks also invested in document collection automation, reducing the chances of data entry errors at the start.

While automating data entry increases the likelihood of getting cleaner data, this clean banking data will degrade over time due to some sort of business and customer activity change. For example, take the CARES act in the United States, which mandated banks grant forbearance on federally backed mortgage loans due to the economic effects.

One bank automatically put homeowners with good credit and making regular mortgage payments, in 2020, into forbearance based on online queries about bank services related to CARES. The customers never opted for a delay in their mortgage payments.

Consequences of dirtying previously clean mortgage data meant those homeowners could not pay their monthly mortgage and could not refinance the mortgage or take out another loan at another bank, as their credit report flagged them in forbearance. Also, the bank faced a class action lawsuit for turning good clean data into bad.

Dirty Data Becomes More Expensive with Subsequent Transactions

If the costs of dirty data in banking could be handled with the bad business transaction and not impact other subsequent needs or services, banking customers would be happier. Unfortunately, the dirty data, transformed from the clean data, continues to cause problems within other business processes. Mainly, banks find it hard to locate and correct dirty data or keep clean data clean when it passes through multiple disconnected systems.

As of March 2021, a Digital Banking Report found only 8% of banks had multiple systems connected to a unified database. Only 7% have a single system that did ‘everything.’ With standard customer data entry, usage, and reporting varying between departments, banks face significant challenges with available data.

For starters, a business unit may try to fix dirty data and later find that it still exists in another business transaction, as in the mortgage forbearance confusion described above. Good banks, lenders, and payment processors get stuck with the bad data, unable to correct it for the customer.

Costs of Dirty Data Is Too High

Without adequate data quality cleaning processes, and testing, the available banking data today risks problems as dirty data tomorrow. Like a mortgage financial meltdown or a pandemic, one major crisis uncovers dirty data in banking with the worst timing.

Good people end up with little trust in their banking data and little they can do to fix it in a crisis. Good data keeps the banking business, from end-to-end, flowing smoother.

About The Author

Michelle Knight

Michelle Knight has a Master of Library and Information Science and software testing experience at two banks. She writes about data quality and management. Her works have been cited by multiple publications, including the  Corporate Compliance Insights, the CDA Institute, and AIMS, a division of the Food and Agriculture Organization of the United Nations (FAO).

The views expressed in this article are those of the authors and do not necessarily reflect the views or policies of The World Financial Review.