Approximate Matching

Blog Administrator | Analyzing Data, Data Management, Data Quality, Duplicate Elimination, Record Linkage | , , , , ,

By David Loshin

Actually, my first name is not David – that is really my middle name, but it is the given name my parents used when talking to me. This has actually led to a lot of confusion over the years, especially when confronted with a form asking for me “first name” and my “last name.” For official forms (like my driver’s license) I use my real first name as my “first name,” but for non-official forms I often just use David.
Read More

The Challenge of Identifying Information

Blog Administrator | Analyzing Data, Data Integration, Data Management, Data Quality, Record Linkage | , , , ,

By David Loshin

In my last post, I introduced the question of determining which characteristics are used to uniquely differentiate between any pair of records within a data set. The same question is relevant when attempting to match a pair of records as well, once they are determined to represent the same entity. I like to call these “identifying attributes,” and the values contained therein I call “identifying information.”
Read More

Distributed Data and Distributed Information

Blog Administrator | Analyzing Data | , , , ,

By David Loshin

You might not realize how broad your electronic footprint really is. Do you have any idea how many data sets contain information about and specific individual? These days, any interaction you have with any organization is likely to be documented electronically. And, for those curious enough to read the fine print of the “privacy” policies, you might not be

surprised to find that many of those organizations managing information about you are sharing that information with others.… Read More