Centricity and Connections: Clearing the Air
Blog Administrator | Address Quality, Analyzing Data, Customer Centricity, Data Quality, Record Linkage |
By David Loshin
There are opportunities for adjusting your strategy for customer centricity based on understanding the grouping relationships that bind individuals together (either tightly or loosely). And in the last post, we looked at some examples in which linking customer records into groups was straightforward when the values to be compared and weighted for similarity are exact matches. When the values are not exact, it introduces some level of doubt into the decision process for including a record into a group.
Let’s revisit our example from my last post by adding in a new record for evaluation:
John Hansen, 1824 Polk Ave., Memphis TN 38177
Emily S. Hansen, 1824 Polk Ave., Memphis, TN 38177
Emily Stoddard, 1824 Polk Avenue, Memphis, TN
We had already decided that John and Emily shared a household, but all of a
sudden we have a third record with a name that shares some similarity, with one
of the existing names, and an almost exact street address match (note that the
third record is missing a ZIP code).
We could speculate that “Emily Stoddard” changed her name after she got married
to “John Hansen,” or that she changed an address somewhere as she moved form her
bachelorette pad to their newlywed home. But without exact knowledge of the
facts, it is only speculation, and one must exercise some care when relying on
speculation for business decisions.
If a few small differences pose a challenge to linkage, what would you think of
dozens, or even hundreds of variations for names, locations, or other data
Just as a case in point: in a hallway conversation at the recent Data Governance
Conference, a colleague mentioned that one of his customers’ databases had over
one hundred variations for a certain big-box retailer’s name! The conclusion you
can draw from this is that a key part of the record linkage process involves
some traditional data quality tactics, namely appending a standardized version
of the data to help your linkage algorithms score record similarity as a prelude
to establishing connectivity.