By David Loshin
One last thought: this approach is largely a “data-centric” activity. What I
mean is that it looks at and compares two records regardless of where those
records came from. They might have come from the same data set (as part of a
duplicate analysis) or from different data sets (for consolidation or general
But it does not take into consideration whether one data set models “customer”
data and another models “employee” data. While you may link a customer record
with an employee record based on a similarity analysis of a set of corresponding
data attributes, the contexts are slightly different.
A match across the two data sets is a bit of a hybrid: we have matched the
individual but one playing different roles. That introduces a different kind of
question: are the identifying attributes associated with the “customer” or the
individual acting in the role of “customer”? The same question applies for
individual vs. employee.
And finally, are there attributes of the roles that each individual plays that
can be used for unique identification within the role context? The answers to
these questions become important when matching and linkage are integrated as
part and parcel of a business application (such as the consolidation of data
being imported into a business intelligence framework).