Data Quality May Be in the Eye of the Beholder

Blog Administrator | Data Quality | , , ,

By Elliot King

You know the old cliché that beauty is in the eye of the beholder? There should be a
similar aphorism about data quality being in the eye of the beholder as well.
That is one of the findings of a systematic overview of research into
information quality reported at the International Conference on Information
Quality, jointly sponsored by the
MIT Information Quality Program
and the UALR
Information Quality Graduate Program
by Mouzhi Ge and Markus Helfert,
researchers affiliated with the School of Computing at Dublin City University.

Information quality, the researchers observed, can be defined from two
perspectives. It can be assessed through the eyes of the information or data
consumer, or it can be considered through a more technical perspective. For the
information consumer, quality can be defined as “fit to use.” Will the data
allow the users to do what they want to do? Technically, however, data quality
is defined as meeting certain criteria and requirements.

The differences in these perspectives are more than just theoretical. They drive
the way different communities view and address data quality issues. For example,
from a technical perspective, a record may have a spelling error. From the data
consumers’ perspective, the information is not accessible. Along the same lines,
from a technical perspective, a record may have an incorrect value. From the
user’s perspective, the data is unreliable or not credible. And so on.

A well-crafted data quality program should be based on a definition of data
quality that reflects both perspectives. The working definition for information
quality Ge and Helfert propose is that quality reflects data that is free of
defects and contains the features needed by users. The challenge for data
quality professionals is to make the link between the technical definition of
data quality–data that meets technical requirements and is free from errors–and
the definition of data quality used by consumers of the data. Can I do what I
want to do with the data efficiently?