By Elliot King

Elliot King

As everybody knows, data quality is usually measured along seven dimensions–the four Cs of completeness, coverage, consistency, and conformity plus timeliness, accuracy and duplication. And the general method to judge data quality is to establish a standard for each of these dimensions and measure how much of the data meets these standards.

For example, how many records are complete; that is, how many of your records contain all of the essential information that the standard you established requires them to hold? Or how much of your data is accurate; that is, do the values in the records actually reflect something in the real world.

As Malcolm Chisholm pointed at in a series of posts not long ago, conceptualizing data quality as a set of dimensions may be misleading or at least not that useful. The argument is both philosophical and practical and while philosophers can debate the relationship of an abstraction to the real world, the practical concerns about the dimensions of data quality raise interesting questions.

The real issue is this–as they are currently conceptualized, are data quality dimensions too abstract; do they actually reveal something real, meaningful and useful about the data itself? And does measuring data according to those standards–i.e. establishing their quality– lead to useful directions to improve business processes?

For example, the International Association for Information and Data Quality defines timeliness as “a characteristic of information quality measuring the degree to which data is available when knowledge workers or processes require it.”

Obviously, the sense of timeliness in that definition reflects more on the ability to get at data when it is needed than on any quality of the data itself. However, timeliness of the data also could reflect on how up to date the data is.

Do records contain the most current information? But timeliness in that sense could also be subsumed under the idea of accuracy. If the information is not up to date, perhaps it is just inaccurate. Looked at through another lens, however, even if the data is not timely, that is it is not up to date, maybe the record is not inaccurate, per se, but is just incomplete.

Clearly, the assessment of quality according to individual dimensions is a tricky business. They can overlap and when used without caution can lead to more confusion than clarity.





Leave a Reply

Your email address will not be published. Required fields are marked *