By David Loshin
There are many clearly obvious sets of cross-column dependency rules associated with the typical range of data attributes associated with an individual. For example, contact data carries embedded information that suggests a need for a validity constraint, such as insisting that a street address in a particular town has the correct postal code.
More interesting are those scenarios that connect valid data to proper completion of business processes.
An example that I used to see a long time ago involved a telesales process that required the existence of a customer account number before the transaction could take place. Unfortunately, the process was driven by timely execution of the transaction, so the time required for the creation and approval of the system account record would have delayed the execution of the transaction to a point at which the prospective customer might no longer want to make the purchase.
Therefore, the salespeople managed a few “dummy” accounts that they used in these situations to make the transaction and then (most of the time) later would transfer that transaction to a newly created customer account. This presents an interesting opportunity for a data quality rule: all sales transactions must be associated with a valid (i.e., not a “dummy”) customer account.
Cross-column consistency can manifest itself in a number of ways, either in terms of implied connectivity between two or more data attributes (such as the address consistency example) or assurance that some process executed correctly (such as our customer account example). Either way, the data quality analyst must work with the business users to understand what the expectations are for cross-column consistency.