Data Cleansing and Simple Business Rules

Blog Administrator | Address Quality, Analyzing Data, Data Cleansing, Data Quality | , , ,

By David Loshin

Having worked as a data quality tool software developer, rules developer, and consultant, I am relatively familiar with some of the idiosyncrasies associated with building an effective business rules set for data standardization and particularly, data cleansing. At first blush, the process seems relatively straightforward: I have a data value in a character string that I believe to be incorrect and I want to use the automated transformative capability of a business rule to correct that incorrect string into a correct one.

Here is a simple example: For address correction, we’d like to expand out the
abbreviations for the street type such as “road,” “street,” “avenue,” etc.). For
the road type of “STREET,” we might have rules such as:

• STR is transformed into STREET
• ST is transformed into STREET
• St. is transformed into STREET
• St. is transformed into STREET
• Str is transformed into STREET
• Str. is transformed into STREET

And so on. The approach that would be taken is to integrate these rules into a
data cleansing rules engine, and then present our strings to be corrected
through the engine. To continue the example, (and if we also included a rule
that upper-cases all letters), the string “1250 Main Str.” might be transformed
into “1250 MAIN STREET” and provided back to the calling routine. Seems simple,

Of course it is. And simplistic as well, since the same transformation might
happen when presenting this street name as well: “St. Charles St,” which would
be changed into “STREET CHARLES STREET” when using that same set of rules.
Because the rule is so basic, there are no controls over how, where, and when
the rule is applied. We’d have to have more rules and a bit more control to
effectively transform and correctly correct the data. More in my next post…