Get Used to It: Inconsistent Data is the New Normal

Melissa Team | Address Quality, Analyzing Data, Analyzing Data Quality, Data Cleansing, Data Management, Data Quality | , , , , ,

By Elliot King

Nobody is perfect and neither is corporate data. Indeed, data errors are intrinsic to IT’s DNA. Data inevitably decays. Errors can be caused when data from outside sources are merged into a system. And then, of course, the humans that interact with the system are, well, human.

Unfortunately, despite the best efforts of data quality professionals, the three major IT trends–analytics, big data, and unstructured data–while promising great payoffs generally, promise to exacerbate data quality issues.

Perhaps analytics presents the most interesting set of challenges. Intuitively, companies believe that the more information incorporated into the analytic process, the sounder the outcome. This leads companies to investigate or incorporate data sets that have been little used or overlooked in the past.

And when you look into new places, sometimes you find surprises. Patient records are perhaps the most well publicized example. Few people ever closely scrutinized the paper records maintained by most doctors. But now that patient information is being imported into electronic patient records, huge numbers of mistakes are coming to the surface–both those that the examining doctor made initially, and those from the import process itself.

The problems with electronic patient records are emblematic of the Achilles heel of big data in general. It seems pretty obvious that the more data you collect, the more mistakes will be embedded in the data. Quantity works against quality in most cases, particularly when the growth of data is being driven by a range of new input devices of uneven reliability, such as sensors and Web processes.

But the issue is not just one of the size of databases, but the nature of the data captured. The main driver of big data is unstructured information and almost by definition, unstructured information is inexact, as are the methods for managing unstructured data (although they are consistently improving over time.)

Face it, data has always been messy and is getting messier. Consequently, data quality efforts have to be consistent and ongoing.