Record Linkage and Data Enhancement

Blog Administrator | Data Enhancement, Data Enrichment, Data Management, Data Quality, Duplicate Elimination, Record Linkage | , , , , , ,

By David Loshin

In my last two posts we looked at the distribution of information about entities and the use of record linkage to find corresponding data records in different data sets that can be linked together. Record linkage can be used for a number of processes that we bundle under the concept of “data enhancement,” which we’ll use to describe any methods for

improving the value and usefulness of information. In this post, we’ll look at three different types of enhancement:

· Data cleansing – The first type of enhancement is relatively straightforward:
our idea is to link records together for the purposes of cleansing the data, or
making it more suitable for use. Often, one data set may have a more trustworthy
representation of an entity, or we may have more than one data set, each
potentially containing overlapping data elements such as birth date, address,
telephone number. By linking two different records, you can compare the
corresponding values, find those that are of better quality (e.g. more complete
or more current values) and update the “delinquent” record with the higher
quality values.

· Enrichment – Existing records for entities (such as people or products) can be
matched against other data sets with additional reference information. For
example, you might want to match your customer data with a credit bureau’s data
and enrich your own data set with each individual’s credit ratings.

· Merge/Purge – Duplicate records entered into one data set often plague the
business in attempting to actively manage customer accounts. Applying the record
linkage methodology to the records in a single data set helps find multiple
records that refer to the same individual. These records can be presented to a
data analyst to review and determine the surviving record and updating the
record with the highest quality values.

There are many variations on these themes. For example, merge/purge can be used
for combining customer data sets after a corporate acquisition; enrichment can
be used to institute a taxonomic hierarchy for customer classification and
segmentation. Loosening the matching rules for merge/purge can help with a
process called “householding,” which attempts to identify individuals with some
shared characteristics (such as “living in the same house”).