Where to Start with Data Quality
Blog Administrator | Analyzing Data, Analyzing Data Quality, Data Cleansing, Data Quality |
By David Loshin
In one of my previous posts I noted that there were two common questions about a data quality program. In my last post we dealt with the first about developing a business justification, so this week we will look at the other one.
I believe that there are three key tasks that must be done at the beginning of a data quality initiative, all intended to help laser-focus the plan to best address the most critical needs:
- Solicit data quality requirements from the business users. Don’t directly ask the users about their data quality rules. Instead, devise a standardized approach to interviewing the business users about the way they rely on information within the context of their day-to-day activities. Document their description of their data use, extract their key dependencies, synthesize some quantifiable measures, and then reflect those measures back to them to validate your analysis.
- Perform a data quality assessment. Combine the use of statistical profiling tools and qualitative review to assess the degree to which data sets comply with user expectations and to find potential anomalies that require closer inspection to determine criticality.
- Establish a process and repository for data quality incident reporting and tracking. Provide a centralized management process that allows the users to report issues as they are identified, automate data practitioner notification, provide a set of rules for prioritization and evaluation, and then provide a workflow management scheme that ensures that high priority issues are addressed within a reasonable time frame.
In essence, these three steps will allow you to do a current state assessment of the levels of data quality, document the most critical ones, and help prioritize the methods by which those issues are investigated and resolved.