Performance Scalability

Blog Administrator | Address Quality, Analyzing Data, Analyzing Data Quality, Data Integration, Data Management, Data Matching, Data Quality, Duplicate Elimination, MDM | , , , ,

By David Loshin

In my last post I noted that there is a growing need for continuous entity identification and identity resolution as part of the information architecture for most businesses, and that the need for these tools is only growing in proportion to the types and volumes of data that are absorbed from different sources and analyzed.

While I have discussed the methods used for parsing, standardization, and matching is past blog series, one thing I alluded to a few notes back was the need for increased performance of these methods as the data volumes grow.… Read More

Reflections: The Challenges of Master Data Resolution

Blog Administrator | Address Quality, Analyzing Data, Analyzing Data Quality, Data Management, Data Quality, MDM | , , , , , ,

By David Loshin

I have worked for almost fifteen years on what would today be called master data management. I recall the first significant project involved unique identification of individuals based on records pulled from about five different sources, and there were three specific challenges:
  1. Determination of identifying attributes – specifying the data elements that, when composed together, provide enough information to differentiate between records representing different entities;
  2. Identity resolution in the presence of variation-having the right algorithms, tools, and techniques for using the identifying attribute values to search for and find matching records among a collection of source data sets; and
  3. Performance management- tuning the algorithms and tools properly to ensure (as close to) linear scalability as the volumes of data grow.
Read More

Address Quality – Take 2

Blog Administrator | Address Correction, Address Quality, Address Standardization, Analyzing Data, Data Cleansing, Data Management, Data Quality, Postal Address Standards, USPS | , , , , , , , , ,

By David Loshin

We have dealt with some of our core address quality concepts, but not this one:

The intended recipient must be associated with the deliverable address.

The problem here is no longer address quality but rather address
correctness
.

The address may be complete, all the elements may be valid, the
ZIP+4 is the right one, and all values conform to standardized abbreviations …
and still be incorrect if the recipient is not associated with the
address!… Read More

Modeling Issues and Entity Inheritance

Blog Administrator | Data Management, Data Quality, Fuzzy Matching, Record Linkage | , , , , , ,

By David Loshin

In our last set of posts, we looked at matching and record linkage and how approximate matching could be used to improve the organization’s view of “customer centricity.” Data quality tools such as parsing, standardization, and business-rule based record linkage and similarity scoring can help in assessing the similarity between two records. The result of the similarity analysis is a score that can be used to advise about the likelihood of two records referring to the same real-life individual or organization.
Read More

The Challenge of Identifying Information

Blog Administrator | Analyzing Data, Data Integration, Data Management, Data Quality, Record Linkage | , , , ,

By David Loshin

In my last post, I introduced the question of determining which characteristics are used to uniquely differentiate between any pair of records within a data set. The same question is relevant when attempting to match a pair of records as well, once they are determined to represent the same entity. I like to call these “identifying attributes,” and the values contained therein I call “identifying information.”
Read More