Modeling Issues and Entity Inheritance

Blog Administrator | Data Management, Data Quality, Fuzzy Matching, Record Linkage | , , , , , ,

By David Loshin

In our last set of posts, we looked at matching and record linkage and how approximate matching could be used to improve the organization’s view of “customer centricity.” Data quality tools such as parsing, standardization, and business-rule based record linkage and similarity scoring can help in assessing the similarity between two records. The result of the similarity analysis is a score that can be used to advise about the likelihood of two records referring to the same real-life individual or organization.

One last thought: this approach is largely a “data-centric” activity. What I
mean is that it looks at and compares two records regardless of where those
records came from. They might have come from the same data set (as part of a
duplicate analysis) or from different data sets (for consolidation or general

But it does not take into consideration whether one data set models “customer”
data and another models “employee” data. While you may link a customer record
with an employee record based on a similarity analysis of a set of corresponding
data attributes, the contexts are slightly different.

A match across the two data sets is a bit of a hybrid: we have matched the
individual but one playing different roles. That introduces a different kind of
question: are the identifying attributes associated with the “customer” or the
individual acting in the role of “customer”? The same question applies for
individual vs. employee.

And finally, are there attributes of the roles that each individual plays that
can be used for unique identification within the role context? The answers to
these questions become important when matching and linkage are integrated as
part and parcel of a business application (such as the consolidation of data
being imported into a business intelligence framework).