Content Standards for Data Matching and Record Linkage

By David Loshin As I suggested in my last post, applying parsing and standardization to normalize data value structure will reduce complexity for exact matching. But what happens if there are errors in the values themselves? Fortunately, the same methods of parsing and standardization can be used for the content itself. This can address the types of issues I noted…

Continue Reading

Normalizing Structure Using Data Standardization for Improved Matching

By David Loshin In my last few posts, I discussed how structural differences impact the ability to search and match records across different data sets. Fortunately, most data quality tool suites use integrated parsing and standardization algorithms to map structures together. As long as there is some standard representation, we should be able to come up with a set of…

Continue Reading

Structural Differences and Data Matching

By David Loshin Data matching is easy when the values are exact, but there are different types of variation that complicate matters. Let's start at the foundation: structural differences in the ways that two data sets represent the same concepts. For example, early application systems used data files that were relatively "wide," capturing a lot of information in each record,…

Continue Reading