Data Quality Problems are Predictable

Blog Administrator | Analyzing Data Quality, Data Management, Data Quality | , , , , , ,

By Elliot King

The idea that poor data quality is costly and hurts performance is about as old as science itself. The seminal science writer Stephen Jay Gould wrote a whole book about how faulty data leads to faulty conclusions, often to the great detriment of society. And one of lasting aphorism in computing has been “garbage in, garbage out.”

Moreover, the problems and risks of poor data quality have been studied,
described and quantified for decades. Data scientists have explored how to
ameliorate data quality problems; software vendors have developed the needed
tools, and companies have invested heavily in technology to rectify the
shortcomings in their data.

So why do these problems persist and are entirely predictable, despite efforts
to correct them? The most obvious reason is that in most organizations, data
quality issues are not top-line agenda items for those in a position to ensure
that they are regularly addressed. Too often, nobody truly feels that they “own”
specific data, so if there are problems, they assume that somebody else will fix
them. Even worse, in some cases users may not even feel the need to address bad

But the most obvious reason is not always the most compelling. After all,
employees can be trained to be alert to data problems. They can be educated
about the impact of poor data quality on ongoing operations and company success.
Safeguarding data quality can be assigned as part of their jobs.

All that will help immensely, but it won’t completely prevent data quality
challenges. The primary reason data quality problems continue to haunt us is
that data flows are open. Data continually streams in from a wide variety of
sources, many of which are only marginally controlled. Moreover, the internal
use of data changes over time and that places new demands on data. Currently,
corporate data should be thought of as being organic. It grows and changes over
time, as does its uses.

So you should come to terms with data quality issues and accept that they will
always arise. Data quality improvement must be a continual process designed to
limit the negative impact of bad data. But bad data will never be entirely