It
seems that it should be rather easy to gather and maintain high quality data,
but in the real world in which you and I live, that is not the case. Companies
gather data from so many sources that errors inevitably make their way into
valuable company databases. Consider a typical customer-oriented database. End
users may enter data via a Web interface and mistakes can happen. 

Customer
service representatives may input information about specific customers based on
call center activity and they could hit the wrong key. Data may be transcribed
incorrectly (think of a medical office here). There could be errors in
transmitting data and processing errors, as flaws in software are not as
infrequent as you may think. Finally, additional data may be integrated from a
third-party vendor and that data could be flawed – well let me count the ways.


Validation vs. Verification

With
that in mind, two of the critical components of a data quality program are
validation and verification. These two concepts are often confused but are not
the same. Validation means that the data adheres to a certain specified and
expected format. For example, a ZIP Code will have either five digits or if you
are using ZIP+4, it could have 9 digits. If a ZIP Code recorded in a database
has five digits, it is valid. The data fits the expected and required format.

But
even if the data is “valid” in data quality terms, it could still be wrong.
Consider the word “great.” Obviously, the word “great” must have five letters
for it to be correct. (Gr8 actually is not correct). If you write the word
“grate” in your database, the data would have five letters, so it would be
valid. But it would also be wrong.

The
process of insuring your data is actually correct, as well as being in the
correct format is called verification. Verification involves comparing the data
in your database to some standard. One of the most common verification
processes is requiring people to enter their password twice when signing up for
something.

There
are several technical approaches to insuring that data is valid including
checking the length and range of different fields as well making sure that all
the necessary fields are completed. Ah, those pesky red asterisks we see
designating a mandatory field on so many forms we fill out. 

Data
verification is much trickier. Take the example of the password. If the
password is entered incorrectly twice, the data in the database will be wrong
as well. In most cases, data verification requires comparing one data to
another that you know is correct. 

At
the bottom line, data validation is an essential tool to insure data quality
but it is not enough. As well as being in the right format, you want your data
to be right.


Leave a Reply

Your email address will not be published. Required fields are marked *