By David Loshin
I have worked for almost fifteen years on what would today be called master data management. I recall the first significant project involved unique identification of individuals based on records pulled from about five different sources, and there were three specific challenges:
- Determination of identifying attributes – specifying the data elements that, when composed together, provide enough information to differentiate between records representing different entities;
- Identity resolution in the presence of variation-having the right algorithms, tools, and techniques for using the identifying attribute values to search for and find matching records among a collection of source data sets; and
- Performance management- tuning the algorithms and tools properly to ensure (as close to) linear scalability as the volumes of data grow.