By Elliot King

Elliot King

In many cases, quality, like beauty, is in the eyes of the beholder. The exact characteristics that define quality can be hard to describe. For example, a news report recently described a new, synthetic method for producing diamonds. Would those diamonds be of the same quality? And would they be as desirable, as diamonds mined and refined in the regular way? The answer depends on whom you ask.

Fortunately, that is not case with data quality. Beyond accuracy, high
quality data generally has three clear and measurable
characteristics–consistency, completeness and compactness. Since information
systems are complex, in many cases the same “fact” is represented
inconsistently. Inconsistent or dirty data is introduced into the information
system because integrity and domain constraints and data rules are not
rigorously enforced.

A numerical representation of a month, for example, must fall between 1 and 12.
If the system requires two digits in the month field, the representation of a
month must be between 01 and 12.Since data is captured through various methods
in many organizations, too frequently, a month can be represented in different
ways.

Another common source of inconsistent data is when companies fail to adhere to
business rules. For example, an “order due” date should not be earlier than an
“order placed” date and so on. Inconsistent data can cause significant problems
in downstream processing and analytics.

The second characteristic of high quality data is completeness. Different parts
of an organization needs different kinds of information and data records should
provide the information needed by all the stakeholders. For example, the
maintenance department of a new car dealership may want to link maintenance
records to model type and owner. The sales department may be most interested in
the number of customer visits prior to closing a sale, per individual. The
marketing department might be most interested in basic customer information. A
good information system will capture all of that data.

The third characteristic of high quality data is compactness. Redundant
data–multiple records reflecting the same person, for example–helps fuel
significant data problems. Perhaps most damaging, redundant records can be very
misleading. A company may overestimate the number of customers it has or
underestimate the value of an individual customer if multiple records represent
a single customer.

Consistency, completeness and compactness are essential characteristics of high
quality data. They can be identified, measured and rectified if needed. But it
takes effort, attention and commitment to do so.


Leave a Reply

Your email address will not be published. Required fields are marked *