Classifying Data Quality Problems

Blog Administrator | Address Quality, Analyzing Data, Analyzing Data Quality, Data Quality | , , , ,

By Elliot King

Data quality is generally most fruitfully defined in the context of its use. Is the data good enough to allow the process with which it is associated to run efficiently and effectively? For example, is the mailing list you are using for a direct solicitation accurate enough that you can achieve your goals and not generate any unwanted and unanticipated negative consequences?

And while that definition may be good enough in a practical sense for
specific issues, it really isn’t good enough to diagnose the sources of data
quality problems generally. Constructing a general framework for data quality
problems can be a useful guide in better identifying and resolving specific

One of the earliest efforts to better understand the nature of data quality
problems calls for classifying problems into three general
categories–operational, conceptual and organizational. Operational data quality
issues are those that are generated through problems with data capture and
transmission. Inaccurate data is collected. Data may be missing. Or data may be
corrupted through some process, for example.

Conceptual data quality problems occur when data is not well defined or it is
inappropriate for its intended use. One of the most famous examples of a
conceptual data quality problem (though it is not often thought of in this way)
was brought to light in the movie Moneyball.

The basic thrust of the movie was not that the information old-time baseball
scouts used to evaluate players was wrong per se; it was they were collecting
the wrong data to identify productive players. Batting average, for example, is
less useful in determining a player’s value than on-base percentage. A pressing
new conceptual data problem is the attempt to use electronic patient records to
judge medical treatment outcomes.

When operational and conceptual data problems persist over time despite repeated
attempts to fix them, organizational data quality problems are usually the
culprit. In these cases, wrong, missing and invalid data is not really the
problem, but the symptom. Something has to be fixed in the organizational
structure or culture.

The point is this–data can be wrong for many reasons and it can’t fundamentally
be fixed without a general understanding of the error’s cause.