By Elliot King

Elliot King

Mistakes happen. No matter how effective your data quality program is; no matter how well trained your personnel are; no matter how aware you are of the high cost of low data quality, data errors will creep into your databases. The reason is simple. Before information winds up in a database, it passes through a series of steps involving both human interaction and computation from data acquisition to archival storage. There are so many opportunities for things to go astray, inevitably from time to time they will.

Perhaps the most common source of data errors and one of the most difficult to
correct is data entry itself. In many cases, humans–typically data entry clerks,
customer service representatives or end users via the Web or some other
mechanism–initially enter data into the system.

They enter data incorrectly for many reasons. Data entry personnel or customer
service representatives may be required to work too fast. The entry screen may
be poorly designed. Or, in some cases, data may be incorrectly entered
intentionally. For example, in several cases across the country, criminal lab
personnel have been found to have entered data into their systems for tests that
they did not perform. And a recent study of the use of electronic medical
records found that medical personnel may be falsely claiming to have performed
specific examinations.

Incorrect data entry, accidental or not, is only one part of the problem.
Sometimes the data itself is just wrong. Measurements have been taken
incorrectly. Sampling techniques are wrong and so on. As data volumes increase,
controlling the quality of the initial data becomes more difficult. Moreover,
frequently, data analysis does not require, or cannot use, all the data
collected. The techniques used to distill or summarize data may be faulty.

Finally, in many cases, data does not come from a single source. With data
flowing into systems from several different directions, data integration is a
challenge. In practice, databases are always evolving and as more data is
incorporated into different repositories, inconsistencies must always be
resolved.

There is some hope. If you pay attention, the amount of incorrect data in your
system can be reduced, but probably cannot be totally eliminated. The sources of
errors are just too pervasive.


Leave a Reply

Your email address will not be published. Required fields are marked *