The process of selecting surviving records means selecting the best possible
candidate as its representation. However, best in the perspective of
survivorship can really mean a lot of things. It can be affected by the
structure of data, where the data is gathered from, how data comes in, what kind
of data is stored, and sometimes by the nature of business rules. Thus
techniques can be applied in order to accommodate certain types of variations
when performing survivorship. We find that there are three very commonly used
techniques in determining the surviving record:
I. Most Recent
Date stamped records can be ordered from most recent to less recent. The most
recent record can be considered eligible as the survivor.
II. Most Frequent
Matching records containing the same information are also an indication for
correctness. Repeating records indicate that the information is persistent and
III. Most Complete
Field completeness is also a factor of consideration. Records with more values
populated for each available field are also viable candidates for survivorship.
Although these techniques are commonly applied in survivorship schemas, its
correctness may not be as reliable in many circumstances. Because these
techniques apply to almost any type of data, the basis in which a surviving
record is created conforms only to “generic” rules. This is where Melissa Data
is able to set itself apart from “generic” survivorship. By leveraging reference
data, we can steer a way to generating better and more effective schemas for
The incorporation of reference data in survivorship changes how rules come into
play. Using the Most Recent, Most Frequent or Most Complete logic really has
more of an aesthetic basis for selection. Ideally, the selection of the
surviving record should be based off an actual understanding of our data.
And this is where reference data comes into play. What it boils down to at the
very end is simply being able to consolidate the best quality data. Thus by
incorporating reference data, we gain an understanding of the actual contents of
data, and create better decisions for survivorship. Let’s take a look at some
instances on how reference data and data quality affect decisions for
I. Address Quality
Separating good data from bad data should take precedence in making decisions
In the case of addresses, giving priority to good addresses makes for a better
decision in the survivorship schema.
II. Record Quality
It could also be argued that good data may exist in a single group of matching
records. In cases like these, we can assess the overall quality of data by
taking into consideration other pieces of information that affect the weight of
overall data quality. Take for example the following data:
In this case, the ideal approach is to evaluate multiple elements for each record in the group. Since the second record contains a valid phone number, it can be given more weight or more importance than the third record despite it being more complete.
Whether we’re working with contact data, product data or any other form of data, in summary, the methodologies and logic used for record survivorship become dependent primarily on data quality. And however we choose to define data quality, it is imperative that we keep only the best pieces of data if we are to have the most accurate and correct information. In the case of Contact Data however, Melissa Data changes the perspective as to how the quality of data is defined, therefore breaking the norm of typical survivorship schemas.