Reflections: The Challenges of Master Data Resolution

By David Loshin

I have worked for almost fifteen years on what would today be called master data management. I recall the first significant project involved unique identification of individuals based on records pulled from about five different sources, and there were three specific challenges:
  1. Determination of identifying attributes – specifying the data elements that, when composed together, provide enough information to differentiate between records representing different entities;
  2. Identity resolution in the presence of variation-having the right algorithms, tools, and techniques for using the identifying attribute values to search for and find matching records among a collection of source data sets; and
  3. Performance management- tuning the algorithms and tools properly to ensure (as close to) linear scalability as the volumes of data grow.
Business Rules Rule

By Elliot King

Back in the day when television sets were still built in America, the Zenith Corp. ran an ad that proclaimed that the quality went in before the name went on. Okay, at some point Zenith was trying to gloss over the fact that the company had fallen behind in automation and a lot of their manufacturing process was still conducted by hand.
Approximate Matching

By David Loshin

Actually, my first name is not David – that is really my middle name, but it is the given name my parents used when talking to me. This has actually led to a lot of confusion over the years, especially when confronted with a form asking for me “first name” and my “last name.” For official forms (like my driver’s license) I use my real first name as my “first name,” but for non-official forms I often just use David.
Record Linkage and Data Enhancement

By David Loshin

In my last two posts we looked at the distribution of information about entities and the use of record linkage to find corresponding data records in different data sets that can be linked together. Record linkage can be used for a number of processes that we bundle under the concept of “data enhancement,” which we’ll use to describe any methods for

improving the value and usefulness of information.