By David Loshin

One of the most challenging data flaws that appear in relational database systems is the absence of referential integrity across different tables. For example, consider a transaction processing system in which one table captures the quantity and total costs for purchased items, and each record refers to a product reference code that can be looked up in a product master table.

What happens if a purchase record refers to a product code that does not exist in the product master table?

Typically, for the execution of the transaction, this would certainly be a problem in attempting to fulfill an order for unknown products. But even if you assume that the product existed when it was purchased, it might have been eliminated or replaced as time advances and the product line is upgraded or enhanced. In this case, historical reporting or trend analysis would inadvertently fail, potentially leading to flawed results and even flawed decisions.

This is an example of the need for assurance of referential integrity, which asserts that a reference to an entity used in one table exists as an attribute in either the same or a different table. More simply, it implies that you may not refer to something that does not exist. Integrity constraints are examples of cross-table data quality rules that can be validated using data profiling.

And in the upcoming post, we will discuss how data profiling tools are instructed to perform cross-column and cross-table validation.





Leave a Reply

Your email address will not be published. Required fields are marked *