By David Loshin
In my last post, we began to look at the value proposition for grouping individual customers into logical groupings. We began by looking at a grouping that generally appears naturally, namely the traditional residential household.
We talked about householding in a previous blog posting, but it is worth
reviewing the basic approaches used for determining that a group of individuals
share a household. The general approach is to analyze a collection of data
records and examine sets of identifying attributes for degrees of similarity in
naming and residence locations. Many situations are relatively straightforward,
such as this example:
John Hansen, 1824 Polk Ave., Memphis TN 38177
Emily S. Hansen, 1824 Polk Ave., Memphis, TN 38177
In this example, two individuals share both a last name and a location address,
and although the data evidence does not guarantee truth of the inference, it
might be reasonable to suggest that because there is a link between the family
name and the residence location, these two individuals are members of the same
household. The algorithm, then, is to link records into a collection of similar
records based on similarity of the surname and residence characteristics.
However, the concept of grouping is not limited to conventional groups, since
there are many artificial groups formed as a result of shared interests or
similarities in profile criteria. For example, people interested in certain
sports car models often organize “fan clubs,” new mothers often organize toddler
play groups, and sports team fans are often rabid about their franchise
In turn, your company might want to create marketing campaigns that target sets
of individuals grouped together by demographic or psychographic attributes. In
these cases, you would adjust your algorithms to link records based on
similarity of the values in other sets of data attributes.
Establishing the link goes beyond looking at the data that already exists in
your data set. Rather, you may need to append additional data acquired from
And, interestingly enough, you will need to connect the acquired data to your
existing data, and that requires yet another record linkage effort. Apparently,
understanding customer collectives is pretty dependent on record linkage. And
while linking records is straightforward when all the data values line up
nicely, as you might suspect, there are some curious intricacies of linkage in
the presence of data with questionable quality.