Moving to Action

By Elliot King The first step in a data quality program is to assess your data. Whether you opt for data profiling or some other assessment mechanism, this part of the process consists of systematically identifying exactly where the problems can be found in your data sets. While assessment is obviously the first step, it should be just as obvious…

Continue Reading

Improving Identity Resolution and Matching via Structure, Standards, and Content

By David Loshin One of the most frequently-performed activities associated with customer data is searching - given a customer's name (and perhaps some other information), looking that customer's records up in databases. And this leads to an enduring challenge for data quality management, which supports finding the right data through record matching, especially when you don't have all the data…

Continue Reading

Designing More Effective Data Cleansing Rules

By David Loshin In my last post, we looked at a simple data transformation and cleansing rule that was to be used to standardize a representation of a street type. We found that an uncontrolled application of the rule made changes where we didn't really want a change to happen. There are two reasons why applying the rule led to…

Continue Reading

Data Cleansing and Simple Business Rules

By David Loshin Having worked as a data quality tool software developer, rules developer, and consultant, I am relatively familiar with some of the idiosyncrasies associated with building an effective business rules set for data standardization and particularly, data cleansing. At first blush, the process seems relatively straightforward: I have a data value in a character string that I believe…

Continue Reading

Address Quality – Take 2

By David Loshin We have dealt with some of our core address quality concepts, but not this one: The intended recipient must be associated with the deliverable address. The problem here is no longer address quality but rather address correctness. The address may be complete, all the elements may be valid, the ZIP+4 is the right one, and all values…

Continue Reading

Characterizing the Quality of Address Data

By David Loshin My company is currently working on a couple of projects associated with address quality and location master data. We are reviewing a lot of the existing documentation that has been collected from a number of different operational systems, as well as reviewing the business processes to see where location data is either created, modified, or read. And…

Continue Reading

Sometimes Data Quality is the Law

By Elliot King We have all read the statistics about the real costs that poor data quality represents. And intuitively, we know that bad data is, well, bad. But, in many cases, bad data is more than just bad for business. Increasingly, good data is required by law. In 2001, the U.S. Congress added two lines to its major appropriations…

Continue Reading

Clean Data is Good Data

By Elliot King The cliché is as old as computing itself--garbage in, garbage out. And that cliché is as true now as ever, if not more so. Unfortunately, with information flowing into companies from so many sources including the Web and third-party providers, mistakes should not just be expected; they are basically inevitable. Garbage data is going to get in…

Continue Reading

The Relative Distinction of Address Validation, Precision, and Accuracy

By David Loshin One nice thing about addresses, especially in the United States, is that they have well-defined standards. In previous blog series, I have looked at the process of address standardization and correction, so I won't belabor that point. However, many people confuse the differences among a  valid address, a precise representation of an address, and an accurate address.…

Continue Reading

Standardizing Your Approach to Monitoring the Quality of Data

By David Loshin In my last post, I suggested three techniques for maturing your organizational approach to data quality management. The first recommendation was defining processes for evaluating errors when they are identified. These types of processes actually involve a few key techniques: 1) An approach to specifying data validity rules that can be used to determine whether a data…

Continue Reading