Where to Start with Data Quality

Blog Administrator | Analyzing Data, Analyzing Data Quality, Data Cleansing, Data Quality | , , , ,

By David Loshin

In one of my previous posts I noted that there were two common questions about a data quality program. In my last post we dealt with the first about developing a business justification, so this week we will look at the other one.

I believe that there are three key tasks that must be done at the beginning of a data quality initiative, all intended to help laser-focus the plan to best address the most critical needs:

  1. Solicit data quality requirements from the business users. Don’t directly ask the users about their data quality rules. Instead, devise a standardized approach to interviewing the business users about the way they rely on information within the context of their day-to-day activities. Document their description of their data use, extract their key dependencies, synthesize some quantifiable measures, and then reflect those measures back to them to validate your analysis.
  2. Perform a data quality assessment. Combine the use of statistical profiling tools and qualitative review to assess the degree to which data sets comply with user expectations and to find potential anomalies that require closer inspection to determine criticality.
  3. Establish a process and repository for data quality incident reporting and tracking. Provide a centralized management process that allows the users to report issues as they are identified, automate data practitioner notification, provide a set of rules for prioritization and evaluation, and then provide a workflow management scheme that ensures that high priority issues are addressed within a reasonable time frame.

In essence, these three steps will allow you to do a current state assessment of the levels of data quality, document the most critical ones, and help prioritize the methods by which those issues are investigated and resolved.





Assembling a Data Quality Management Framework

Blog Administrator | Address Quality, Analyzing Data, Analyzing Data Quality, Data Management, Data Quality | , , , ,

By David Loshin

There are two dominating questions that I am asked over and over again when people are in process to create a program for data quality management.

The first is “how can you develop a business justification for a data quality program?” and the second is “how do we get started?” We are currently working on a task with one customer who seems to be ready to commit to instituting a data quality management program, yet they remain somewhat resistant because of the absence of an answer to the first question and confused about planning because of the absence of an answer to the second.

Let me clarify the scenario somewhat: this is a large organization that has over the years empowered their user community with an atypical degree of data freedom. At the same time, they have a widely distributed management structure for information technology development. The result is, as you can imagine, some controlled chaos.

There are data validation routines here and there for extracting data (from one or more sources) and loading the data into a target system. But these routines are completely distinct and non-standardized, to the point where even in the few places where anyone actually looking at the validation scores would be challenged to make sense of them in attempting to assess data quality and usability.

Luckily, there is a new initiative for considering enterprise-level data services, and data quality has emerged as one of the potential foundations of this service strategy. In my upcoming post, we will look at some aspects of the business justification to be used in socializing the value proposition of data quality improvement.





Where Do You Fit In?

Blog Administrator | Address Quality, Analyzing Data, Analyzing Data Quality, Data Management, Data Quality | , , , , , ,

By Elliot King

Too often, those of us with our noses to the grindstone have no time to look up. We are so busy putting out fires, monitoring and maintaining what we have, or trying to launch new initiatives that we never look around to see how other organizations are dealing with similar issues.

This may be particularly true in the data quality world. Data quality is often seen as an internal problem and it is often addressed differently in different settings, both organizationally and technically. Indeed, even the terminology is not consistent across industries.

So a recent study conducted by the International Association for Information and Data Quality (IAIDQ) working in conjunction with the Information Quality Program at the University of Arkansas, Little Rock (UALR-IQ) reveals some very interesting trends. The survey of 270 data quality professionals identified the top challenges faced by data quality professionals.

Heading the list is a lack of accountability and responsibility for data quality, followed by too many data and information silos to manage, a lack of awareness and discussion of the size and impact of data quality problems and a lack of understanding of what data quality means. These challenges are fundamental and each was tabbed by more than 50 percent of the respondents.

Considering the basic nature of the challenges, perhaps it should be no surprise that 66 percent of the respondents believed that the effectiveness of the data quality efforts in their organization were only OK (some goals were met) or poor (few goals were met.) Ironically, 70 percent claimed that their organizations recognized that data and information were important strategic assets and managed it with that in mind.

So what is driving companies to improve their data quality efforts? According to the survey, the top driver is just a general desire to improve the quality of data, which was cited by 68 percent of the respondents. Other important motivations to improve data quality were the desire to improve business intelligence, and compliance and legal considerations.




So What Keeps You Up at Night?

Blog Administrator | Analyzing Data, Analyzing Data Quality, Data Quality | , , , ,

By Elliot King

In one of his most memorable comments, former Secretary of Defense Donald Rumsfeld once remarked that in assessing the threats to the United States he worried about three kinds of things–what he called the known knowns, or the threats he knew the U.S. was facing; the known unknowns, or threats about which he knew that they didn’t know enough about and had to find out more; and the unknown unknowns, threats that he didn’t know existed at all so the country really could not prepare for them. It was the unknown unknowns that really kept him up at night. What he didn’t know what he didn’t know was the most worrisome.

Rumsfeld had the heavy burden of safeguarding the security of the United States
on his shoulders. And while not nearly as weighty, data quality specialists have
the responsibility of safeguarding the integrity of their organizations’ data.
For them, the easy set of questions is the known unknowns.

Since data quality is generally assessed by five or six characteristics, data
quality specialists should constantly be asking themselves questions that relate
to those characteristics. For example, is the data your company uses correct and
complete? Do you have enough data to achieve your business objectives? Is your
data being updated at appropriate intervals (or asked another way, what is the
rate of data quality decay)? And so on.

Unfortunately, depending on how the data quality program within an organization
is structured, many of those responsible for data quality live in a world of
unknown unknowns. For example, what is the source of data decay? If you import
data from external sources, what is the quality control mechanism in place at
the source? What new initiatives does you organization have in mind that might
have an impact on overall data quality? What new business processes may be
planned that will have an impact on data quality? Is your data quality program
flexible enough to accommodate sudden changes?

Finding the answers to the known unknowns of data quality should be an integral
part of any data quality program. Trying to recognize what you don’t even know
that you don’t know requires imagination and perseverance. And trying to
understand that broader horizon can keep people up at night.
 




Business Rules Rule

Blog Administrator | Analyzing Data, Analyzing Data Quality, Data Management, Data Quality | , , , ,

By Elliot King

Back in the day when television sets were still built in America, the Zenith Corp. ran an ad that proclaimed that the quality went in before the name went on. Okay, at some point Zenith was trying to gloss over the fact that the company had fallen behind in automation and a lot of their manufacturing process was still conducted by hand. But it is the thought that counts. If the parts of a whole are not right, the whole is not going to be right either.

 

Business rules are what define and constrain the critical elements of databases
by controlling the information to be captured. Business rules define each column
in a database; under what conditions data should be entered into a specific
column and the relationship between different data elements.

In some ways, business rules provide the logical structure of the database,
determining what the data elements are and how they fit together. For example,
if a database was thought of as a house, business rules would define what
constitutes a window, how windows fit into walls, and, as importantly, how
windows are to be used. A more real-world example might be–nobody under the age
of 18 may open an account.

Accurate, concise, consistent and precise business rules are essential for two
primary reasons. They guide expectations of what should be contained in a
database. Let’s say you are capturing data via a Web form. Business rules will
define what fields will be required. Or imagine that you want to launch a direct
marketing campaign aimed toward people who are unmarried. Business rules would
dictate which marital status categories, perhaps single, divorced, widowed–to
include.

The problem with business rules is that they usually are developed all over the
organization and there are a lot of them. They can be found in user guides,
system documentation and data entry guidelines. Business rules can be developed
by a host of different people including subject matter experts and systems
developers. And they specify data at a very granular level, such as an employee
termination date must be later than an employee hire date.

Often the first step in a data quality program is to consolidate an
organization’s business rules. Data can then be audited according to the
complete set of rules, mistakes can be defined and corrected, or the rule in
question can be refined.

To paraphrase Shakespeare, the problem sometimes is not in the data, dear
Brutus, but in the rules.