By David Loshin

What I have found to be the most interesting byproduct of record linkage is the ability to infer explicit facts about individuals that are obfuscated as a result of distribution of data. As an example, consider these records, taken from different data sets:

A:

David
Loshin
301-754-6350
1163 Kersey Rd
Silver Spring
MD
20902

B:
Knowledge Integrity, Inc
1163 Kersey Rd
Silver Spring
MD
20902

C:
H David
Lotion
1163 Kersey Rd
Silver Spring
MD
20902

D:
Knowledge Integrity, Inc.
301
7546350
7546351
MD
20902

We could establish a relationship between record A and records B and C because
they share the same street address. We could establish a relationship between
record B and record D because the company names are the same.

Therefore, by transitivity, we can infer a relationship between “David Loshin”
and the company “Knowledge Integrity, Inc” (A links to B, B links to D,
therefore A links to D). However, none of these records alone explicitly shows
the relationship between “David Loshin” and “Knowledge Integrity, Inc” – that is
inferred knowledge.

You can probably see the opportunity here – basically, by merging a number of
data sets together, you can enrich all the records as a byproduct of exposed
transitive relationships.

This provides us with one more valuable type of enhancements that record linkage
provides. And this is particularly valuable, since the exposure of embedded
knowledge can in turn contribute to our other enhancement techniques for
cleansing, enrichment, and merge/purge.


Leave a Reply

Your email address will not be published. Required fields are marked *