Your Deduplication Processes May be Leaving You at Risk for GDPR Fines

Melissa Team | Article, Data Audit, Data Matching, Data Quality, Duplicate Elimination, Fuzzy Matching, GDPR, Global Business, Global Data Quality, Identity Resolution | , , , , , , , , ,

Once-trusted fuzzy matching algorithms may be leaving your organization vulnerable to hefty GDPR fines. The balancing act of false-positives and false-negatives in single customer view (SCV) systems used to favor the false-negative side, with near negligible error results. However, the standard of that balancing act has now been redefined by the GDPR regulations. Find out how GDPR has moved the “match” goalposts, how to test your SCV platform, and what you need to do to keep your organization GDPR compliant.

Matchcode Caveats – How to Solve Them

Blog Administrator | Matching | , , , , ,

By Tim Sidor, Data Quality Analyst

“The more advanced I make my matchcode, the more duplicates I’ll
identify.”

This is an
assumption – true or false – that many of our new users to MatchUp make, but
often leads to false dupes, no dupes, or a process that seems to run forever.

“Why?”

Adding more
columns of conditions, can be looked at as ‘just adding more ways to return
more duplicates.’ This additional criteria may or may not result in accurate
groups, as you may have actually loosened up your intended criteria. On the
flip side, adding matchcode components may result in less duplicates as you may
have tightened up your rules too much. Applying fuzzy algorithms (without
thoroughly testing) will lead to a slower process, but may not return a
significant number of additional matches (diminishing returns of accuracy/speed
vs complexity/inefficiency).

“What can I do?”

When
learning to use MatchUp, we always suggest starting with the basics – a simple
default matchcode that we distribute, and a small data set. This allows you to
quickly run and analyze how the matchcode performed against the data. Then make
small changes – tweaking the matchcode and repeating the process or running a
slightly altered data set with a few variations in format or data values.
Eventually, you will migrate towards your end goal of incorporating your
business rules into the matching strategy (the matchcode) with your production
data.

 

By
following any of the above disciplined paths, you will more quickly arrive at
your goal and with a better understanding of how to create the best matchcode
for your environment. No diagonal shortcuts!

“OK, I already went straight to ‘Production Data and a Custom
Matchcode,’ what do I do?”

First,
evaluate the Result Codes and Dupe Group output properties. In addition to
telling you the output disposition of a record (unique, group winner,
duplicate, etc.), the Result Codes will tell you which matchcode combination
(which column of checkmarks in the matchcode) caused the record to match in a
particular Dupe Group. If you find out that a particular column is never
finding a match, or never finding a match that another column hasn’t already
found – you should consider removing it. This may also prompt you to remove
duplicated component types which may have been used with alternate settings,
from the matchcode. After re-evaluating the remaining components, and
concluding they still represent a valid strategy, you may find that your
process returns more accurate results AND processes much quicker.

“Can my process run faster?”

Yes, MatchUp
uses an advanced clustering method to find duplicates and creating advanced
matchcodes prevent efficient clustering, thus slowing processes down. For
example, we had a customer who we had drop a matchcode component with a fuzzy
setting from the second position to below another component which was using an
exact setting (and in all columns). Their process decreased from 47 hours to
under 4 – by making this simple change. Expanding on the diminishing returns
concept – if an exact matchcode, for example, returns 20,000 duplicates from a
1,000,000 record set – is changing all components to a fuzzy algorithm and then
returning 20,003 duplicates worth a process that takes 4x to run?

“What about that Result Code that tells me a specific combination
returned a false dupe?” or “Why did these records not match under my rules?”

For details
on how a matchcode relates to your data, click here for easy guidance to
understanding your matchcode rules, and remember, test thoroughly!

For more info, go to: https://www.melissa.com/data-deduplication

Record Matching Made Easy with MatchUp Web Service

Blog Administrator | Data Governance, Data Integration, Data Management, Data Matching, Data Quality, Data Quality Components for SSIS, Data Steward, Data Warehouse, Duplicate Elimination, Fuzzy Matching, Golden Record, Householding, Identity Resolution, Record Linkage, SQL Server Integration Services, SSIS, Survivorship | , , , , , , ,

MatchUp®,
Melissa’s solution to identify and eliminate duplicate records, is now
available as a web service for batch processes, fulfilling one of most frequent
requests from our customers – accurate database matching without maintaining
and linking to libraries, or shelling out to the necessary locally-hosted data
files.

 

Now
you can integrate MatchUp into any aspect of your network that can communicate
with our secure servers using common protocols like XML, JSON, REST or SOAP.

 

Select
a predefined matching strategy, map the table input columns necessary to
identify matches to the respective request elements, and submit the records for
processing. Duplicate rows can be identified by a combination of NAME, ADDRESS,
COMPANY, PHONE and/or EMAIL.

 

Our
select list of matching strategies removes the complexity of configuring rules,
while still applying our fast and versatile fuzzy matching algorithms and
extensive datatype-specific knowledge base, ensuring the tough-to-identify
duplicates will be flagged by MatchUp. 

 

The output response returned by the service
can be used to update a database or create a unique marketing list by
evaluating each record’s result codes, group identifier and group count, and
using the record’s unique identifier to link back the original database record.

 

Since
Melissa’s servers do the processing, there are no key files – the temporary
sorting files – to manage, freeing up valuable hardware resources on your local
server.

 

Customers
can access the MatchUp
Web Service
license by obtaining a valid license from our sales team and
selecting the endpoint compatible to your development platform and necessary
request structures here.

Join Us for Address Cleaning in SAP: Integration of REST Web Services

Blog Administrator | Uncategorized | , , , , , , , , ,

 

Join us for a webinar on December 2, 2010.

With the rapid change of data combined with a need for accurate information delivered in a timely and efficient manner, many companies are looking at ways to integrate contact data verification into their CRM systems to prevent bad address, phone and email data from entering the database at point of data entry.

During this webinar, Forrest Horner and Chris Twirbutt of the County of Sacramento will show you how they integrated real-time web-serviced based address verification into SAP and specific design considerations to be aware of. A demonstration will also be performed.

 

Reserve your webinar seat now by clicking the link below:
Thur, December 2, 2010 12:00 PM – 1:00 PM PDT

Melissa Data features weekly seminars on topics ranging from data validation to fuzzy matching and other topics in data quality every Wednesday at 10am, PST.

To view a calendar of our upcoming topics and reserve your seat, click here: http://www.melissadata.com/shows/index.htm