By David Loshin

When inspecting two records for similarity (or for differentiation), the values in the identifying attributes from each corresponding record are compared to determine whether the two records can be presumed to represent the same entity or distinct entities.

For people, there are some obvious attributes used for comparison – they are ones that are inherently associated with the individual, such as first name, last name, birth date, eye color, or birth location.

There are two issues with this limited set of attributes: in many cases, not all of that information has been captured, and as data sets grow, the variation decreases – many people may share the same first name or last name, and even more will share the same birth date. Therefore, the default is to consider additional attribute values that are directly associated with the individual.

The most frequently used values are those associated with contact data such as residential address or telephone number. Although technology evolutions have greatly broadened the spectrum of contact data attributes.

These include: email addresses (of which there may be both professional and private versions), handles used for social media interactions (like those used on Twitter or other online forums), IP address, varieties of mobile telephones, IP telephone numbers (including online-only numbers like those acquired via Google’s Voice service), as well as other assigned identifiers (such as account numbers or the numbers on your supermarket affinity card).

Contact information has become significantly more sophisticated, and some of the previous assumptions that have supported their use in identification no longer necessarily hold true.

For example, for land line telephone numbers, the area code and exchange code could be correlated to a specific location and matched with postal codes. Today, telephone numbers are not only dissociated from location (e.g., a person can retain his Boston-based mobile number even after he moves to anywhere in the United States), they are even dissociated from telephones (such as virtual numbers connected directly to Internet-only systems).

Not only that, the advent of social communities allows for the creation of multiple personas that can be attached to more than one individual. I know of a person who has created multiple Twitter accounts, including one for each of her pets. Retail affinity cards can be shared among members of the same family.

Tracking web transactions by IP address groups multiple actions that could be performed by many people working on the same network and sharing the same Internet connection.

Using contact information for unique identification is a double-edged sword: there is a wider variety of data attributes and values to use, and they can add to the similarity analysis as well as the differentiation process.

However, you must be careful to ensure that the values are not resolving to aliases, nor that they be determined to represent multiple individuals.





Leave a Reply

Your email address will not be published. Required fields are marked *