Everybody loves quality, or so they say. People want to buy high-quality goods; they want their children to get a quality education; when they buy tickets they hope they will attend a quality event. But though the word is bandied about regularly–and let’s be honest, we all have some notion of what quality means–nailing down a firm definition is a little bit elusive. Is quality just in the eyes of the beholder? We know quality when we see it?
Defining data quality is just as tricky as defining quality in general, but the stakes are higher. While you and I can disagree if some piece of art is high quality or not – if companies work with low-quality data, there will be problems and more than likely, big problems.
According to an interesting report published by Capgemini, data quality has five dimensions. The first dimension is completeness. For data to be complete, that means no expected fields associated with the data are missing. For example, high quality address data will include everything associated with an address, including a correct ZIP code.
The second dimension of data quality is that the data conforms to the appropriate standards. The reason this dimension is so important is that most data is shared among many users. If the data does not conform to standards, it cannot be easily used by all the parties who may need to use it.
Adhering to standards, in this definition, is followed by internal consistency, accuracy and a time-stamp that clearly defined the period within which the data is valid.
Those five dimensions are not the only way to define data quality. Some people suggest data quality is measured by how well it represents the real-world construct to which it refers. But those five dimensions suggest one useful way to thinking about and measuring data quality.