Mistakes Are All Around Us
By Elliot King
Perhaps the most common source of data errors and one of the most difficult to
correct is data entry itself. In many cases, humans–typically data entry clerks,
customer service representatives or end users via the Web or some other
mechanism–initially enter data into the system.
They enter data incorrectly for many reasons. Data entry personnel or customer
service representatives may be required to work too fast. The entry screen may
be poorly designed. Or, in some cases, data may be incorrectly entered
intentionally. For example, in several cases across the country, criminal lab
personnel have been found to have entered data into their systems for tests that
they did not perform. And a recent study of the use of electronic medical
records found that medical personnel may be falsely claiming to have performed
Incorrect data entry, accidental or not, is only one part of the problem.
Sometimes the data itself is just wrong. Measurements have been taken
incorrectly. Sampling techniques are wrong and so on. As data volumes increase,
controlling the quality of the initial data becomes more difficult. Moreover,
frequently, data analysis does not require, or cannot use, all the data
collected. The techniques used to distill or summarize data may be faulty.
Finally, in many cases, data does not come from a single source. With data
flowing into systems from several different directions, data integration is a
challenge. In practice, databases are always evolving and as more data is
incorporated into different repositories, inconsistencies must always be
There is some hope. If you pay attention, the amount of incorrect data in your
system can be reduced, but probably cannot be totally eliminated. The sources of
errors are just too pervasive.