Achieving “Proactivity?”

Blog Administrator | Analyzing Data Quality, Data Enrichment, Data Management, Data Profiling, Data Quality | , , , ,

By David Loshin

Standardizing the approaches and methods used for reviewing data errors, performing root cause analysis, and designing and applying corrective or remedial measures all help ratchet an organization’s data quality maturity up a notch or two. This is particularly effective when fixing the processes that allow data errors to be introduced in the first place totally eliminates the errors altogether.

In the cases where the root cause is not feasibly addressed, we still have another standardized approach: defining data validity rules that can be incorporated into probe points in the processes to monitor compliance with expectations and alert a data steward as early as possible when invalid data is recognized.

This certainly reduces the “reactive culture” I discussed in one of the previous posts, and governing the data stewardship activities by combining automated inspection tools such as data profiling, automated data correction and cleansing tools, and incident management reduces replicated analysis efforts as well as repetitive fixes applied at different places and times. In fact, many organizations consider this level of maturity as being proactive in data quality management because you are anticipating the need to address issues that you already know about.

However, I might take a little bit of a contrarian view on this: to truly be proactive you’d have to go beyond anticipating what you know. In this light, we might say that instituting controls supporting inspection, monitoring, and notifications is less about being not proactive and more about being reactive much earlier in the process.

To really be proactive, perhaps it might be more worthwhile to attempt to anticipate the types of errors that you don’t already know. Instead of only using profiling tools to look for existing patterns and errors, you might use these analytical tools to understand the methods and channels through which any types of potential errors could occur and attempt to control the introduction of flawed data before it ever leads to any material impact!