The State Of Data Quality In 2020 – Everything You Must Know

Melissa IN Team | Data Quality | , , , ,

Today, data is an organization’s most valuable asset. The way this data is managed and analyzed is changing the way business information is used. This has made people at all levels of enterprises alerted to the importance of data quality. Unfortunately, when a survey was conducted, very few organizations surveyed had dedicated data quality teams. The organizations were found to be dealing with multiple data quality issues and lacked the building blocks of data management and governance. This could have a negative impact on budget management, communication with consumers, marketing, sales, etc.

Common Data Quality Issues

There are several areas that contribute to poor data quality. Key amongst them are:

Absence of Standardization Norms

This is one of the most common data quality issues faced by organizations. There is no lack of raw data available. However, this data comes in from multiple sources. The most common channels for data collection include sales representatives, websites, mobile apps and call centers. In many cases, it does not usually follow a standardized format. It can thus give rise to inconsistencies. For example, at one point of data entry, a date may be entered in the DD/MM/YY format while at another it may be entered in the MM/DD/YY format. Similarly, a customer’s name may be entered with different spellings. This can not only cause confusion but lead to duplication of records.

Poor data quality control at entry level

Many organizations still rely on manual data entry. This has a high risk of error and is one of the leading causes of poor data quality. The person may enter a wrong value, create duplicate records or may even miss a field. Without proper quality control at this level, the data will go forward and affect all other records as well.
For example, when creating a daily sales report, the person may enter the number of products sold as 20 instead of 200. If not checked, this can affect reordering supplies, the company’s calculation of profits, etc. Poor quality control at data entry levels is responsible for common data errors such as incomplete data, outdated information and inaccuracies.

Poor quality data from third party sources

Not all data used by businesses is generated in house. A lot of data is taken from third parties. This data is often subject to inaccuracies, outdated information, incomplete data and other such common errors. For example, companies may look at third party survey results when planning a marketing campaign. This limits their control over data quality as they may get only a partial view of the survey. Many companies are beginning to realize this and are turning towards government agencies for reliable third party data. This is important especially when data is needed to verify identities.

Unstructured Data

While there is no dearth of data available, not all of this data is structured into usable form. Without proper labels and tags, it is very difficult to use this data. For example, you may have a phone number and address but if this is not connected to a name, it is quite useless. There is also a lack of metadata such as date the data was created or modified, the author, etc. Without this metadata, it is hard to assess how reliable the data is.

Lack of Internal Communication

Most businesses use 3-4 channels to collect and compile data. The data collected by one team is used not only by them but by others as well. For example, data about the density of consumers in different parts of the city may be used by the marketing team as well as the delivery team. However, these teams often do not communicate with each other. For example, the delivery team may realize that phone number associated with a particular address is wrong. They may correct the number in their own records but may not share this with the other teams. Thus, if a third team wants data about the customer, he will get two records- one with an old, invalid phone number and one with the current number.

Absence of a Centralized Data Quality Strategy
A majority of organizations still do not follow a centralized data quality strategy or a team responsible for data quality. As a result, there is no one to take ownership of data. Individual teams are left free to take their own decisions regarding how to collect data, how to tabulate it and what quality standards to maintain. But, what measures as good for one team may not be good enough for another. For example, the pin code may not be a very important data field for the final delivery team but this could be crucial for the marketing team.

Other Issues

There are a number of other issues that plague data quality. Many organizations simply do not have the relevant technology for data quality validation. In addition, there may be a lack of internal manual resources and an insufficient budget to improve upon them. Another common issue is inadequate support from senior management. Though they may use data, they do not take ownership for it and hence do not see the point for enhancing it.


Data quality is an issue that cannot be ignored. The challenges are numerous but they can be overcome. All organizations, big and small need to address issues related to data quality and take formal steps to improve upon them. Creating a dedicated data quality team is crucial to this. Artificial Intelligence initiatives may also be used to identify data quality issues and enhance the data quality. AI-enriched tools can also improve data efficacy and increase productivity. When it comes to data quality, it is important to note that data enhancement is not a one-time effort but needs to be an ongoing process.

Melissa Data Publishes Insight on Top Global Data Quality Challenges

Blog Administrator | Address Validation, Address Verification, Analyzing Data, Analyzing Data Quality, Data Quality, Global Data Quality | , , ,

30 years in the industry, we’ve seen firsthand what bad data does to good
companies. You’ll be surprised to learn some of the issues companies have with
their data. We’ve collected the top 30 most pervasive data quality issues – and
what you need to know to solve them. Download our free Melissa Data Magazine to
learn more. Also featured: our 2015 Data Quality Catalog, packed with info on
our smart, sharp tools that include free trials, source codes, and unlimited
tech support.


Download now!

Get Used to It: Inconsistent Data is the New Normal

Blog Administrator | Address Quality, Analyzing Data, Analyzing Data Quality, Data Cleansing, Data Management, Data Quality | , , , , ,

By Elliot King

Nobody is perfect and neither is corporate data. Indeed, data errors are intrinsic to IT’s DNA. Data inevitably decays. Errors can be caused when data from outside sources are merged into a system. And then, of course, the humans that interact with the system are, well, human.

Unfortunately, despite the best efforts of data quality professionals, the three major IT trends–analytics, big data, and unstructured data–while promising great payoffs generally, promise to exacerbate data quality issues.

Perhaps analytics presents the most interesting set of challenges. Intuitively, companies believe that the more information incorporated into the analytic process, the sounder the outcome. This leads companies to investigate or incorporate data sets that have been little used or overlooked in the past.

And when you look into new places, sometimes you find surprises. Patient records are perhaps the most well publicized example. Few people ever closely scrutinized the paper records maintained by most doctors. But now that patient information is being imported into electronic patient records, huge numbers of mistakes are coming to the surface–both those that the examining doctor made initially, and those from the import process itself.

The problems with electronic patient records are emblematic of the Achilles heel of big data in general. It seems pretty obvious that the more data you collect, the more mistakes will be embedded in the data. Quantity works against quality in most cases, particularly when the growth of data is being driven by a range of new input devices of uneven reliability, such as sensors and Web processes.

But the issue is not just one of the size of databases, but the nature of the data captured. The main driver of big data is unstructured information and almost by definition, unstructured information is inexact, as are the methods for managing unstructured data (although they are consistently improving over time.)

Face it, data has always been messy and is getting messier. Consequently, data quality efforts have to be consistent and ongoing.


Classifying Data Quality Problems

Blog Administrator | Address Quality, Analyzing Data, Analyzing Data Quality, Data Quality | , , , ,

By Elliot King

Data quality is generally most fruitfully defined in the context of its use. Is the data good enough to allow the process with which it is associated to run efficiently and effectively? For example, is the mailing list you are using for a direct solicitation accurate enough that you can achieve your goals and not generate any unwanted and unanticipated negative consequences?

And while that definition may be good enough in a practical sense for
specific issues, it really isn’t good enough to diagnose the sources of data
quality problems generally. Constructing a general framework for data quality
problems can be a useful guide in better identifying and resolving specific

One of the earliest efforts to better understand the nature of data quality
problems calls for classifying problems into three general
categories–operational, conceptual and organizational. Operational data quality
issues are those that are generated through problems with data capture and
transmission. Inaccurate data is collected. Data may be missing. Or data may be
corrupted through some process, for example.

Conceptual data quality problems occur when data is not well defined or it is
inappropriate for its intended use. One of the most famous examples of a
conceptual data quality problem (though it is not often thought of in this way)
was brought to light in the movie Moneyball.

The basic thrust of the movie was not that the information old-time baseball
scouts used to evaluate players was wrong per se; it was they were collecting
the wrong data to identify productive players. Batting average, for example, is
less useful in determining a player’s value than on-base percentage. A pressing
new conceptual data problem is the attempt to use electronic patient records to
judge medical treatment outcomes.

When operational and conceptual data problems persist over time despite repeated
attempts to fix them, organizational data quality problems are usually the
culprit. In these cases, wrong, missing and invalid data is not really the
problem, but the symptom. Something has to be fixed in the organizational
structure or culture.

The point is this–data can be wrong for many reasons and it can’t fundamentally
be fixed without a general understanding of the error’s cause.