Improving Identity Resolution and Matching via Structure, Standards, and Content
By David Loshin
When applications allow free-formed text to be inserted into data elements
with ill-defined semantics, there is the risk that the values stored may not
completely observe the expected data quality rules.
As an example, many customer service representatives may expect that if a
customer calls the company, there will be a record in the customer database for
that customer. If for some reason, though, the customer’s name is not entered
exactly the same way as presented during a lookup, there is a chance that the
record won’t be found. This happens a lot with me, since I go by my middle name,
“David,” and often people will shorten that to “Dave” when entering data, so
when I give my name as “David” the search fails when there is no exact match.
The same scenario takes place when the customer herself does not recall the data
used to create the electronic persona – in fact, how many times have you created
a new online account when you couldn’t remember your user id? Also, it is
important to recognize that although we think in terms of interactive lookups of
individual data, a huge amount of record matching is performed as bulk
operations, such as mail merges, merging data during corporate acquisitions,
eligibility validation, claims processing, and many other examples.
It is relatively easy to find a record when you have all the right data. As long
as the values used for search criteria are available and exactly match the ones
used in the database, the application will find the record. The big
differentiator, though, is the ability to find those records even when some of
the values are missing, or vary somewhat from the system of record. In the next
few postings we’ll dive a bit deeper into the types of variations and then some
approaches used to address those variations.