Performance Scalability

Blog Administrator | Address Quality, Analyzing Data, Analyzing Data Quality, Data Integration, Data Management, Data Matching, Data Quality, Duplicate Elimination, MDM | , , , ,

By David Loshin

In my last post I noted that there is a growing need for continuous entity identification and identity resolution as part of the information architecture for most businesses, and that the need for these tools is only growing in proportion to the types and volumes of data that are absorbed from different sources and analyzed.
Read More

Reflections: The Challenges of Master Data Resolution

Blog Administrator | Address Quality, Analyzing Data, Analyzing Data Quality, Data Management, Data Quality, MDM | , , , , , ,

By David Loshin

I have worked for almost fifteen years on what would today be called master data management. I recall the first significant project involved unique identification of individuals based on records pulled from about five different sources, and there were three specific challenges:
  1. Determination of identifying attributes – specifying the data elements that, when composed together, provide enough information to differentiate between records representing different entities;
  2. Identity resolution in the presence of variation-having the right algorithms, tools, and techniques for using the identifying attribute values to search for and find matching records among a collection of source data sets; and
  3. Performance management- tuning the algorithms and tools properly to ensure (as close to) linear scalability as the volumes of data grow.
Read More

Address Quality – Take 2

Blog Administrator | Address Correction, Address Quality, Address Standardization, Analyzing Data, Data Cleansing, Data Management, Data Quality, Postal Address Standards, USPS | , , , , , , , , ,

By David Loshin

We have dealt with some of our core address quality concepts, but not this one:

The intended recipient must be associated with the deliverable address.

The problem here is no longer address quality but rather address
correctness
.

Read More

Modeling Issues and Entity Inheritance

Blog Administrator | Data Management, Data Quality, Fuzzy Matching, Record Linkage | , , , , , ,

By David Loshin

In our last set of posts, we looked at matching and record linkage and how approximate matching could be used to improve the organization’s view of “customer centricity.” Data quality tools such as parsing, standardization, and business-rule based record linkage and similarity scoring can help in assessing the similarity between two records.
Read More

The Challenge of Identifying Information

Blog Administrator | Analyzing Data, Data Integration, Data Management, Data Quality, Record Linkage | , , , ,

By David Loshin

In my last post, I introduced the question of determining which characteristics are used to uniquely differentiate between any pair of records within a data set. The same question is relevant when attempting to match a pair of records as well, once they are determined to represent the same entity.
Read More