All or Nothing?

Blog Administrator | Address Quality, Analyzing Data, Analyzing Data Quality, Data Quality | , , , , ,

By David Loshin

One of the most frequently referenced dimensions of data quality is completeness. At a formal level, completeness implies rules specifying mandatory assignment of values to particular data elements. In layman’s terms, that specifies rules to make sure critical attributes are populated with values.

Now there are a few things to think about here regarding the critical nature of
completeness rules for data validity, from the data creation side and from the
data consumption side.

Let’s start with the consumption side, and look at two different use cases and
consider the reasons behind completeness expectations: transaction processing
and analytical processing. There should be little doubt about the need for
completeness for transaction processing purposes – in most transactions, there
are some data values that are required for the transaction to complete
successfully. For example, your online order won’t complete if you don’t provide
a method of payment.

However, as more organizations begin to examine how their business processes go
across different functions in the business, there is a greater recognition of
requirements for data values that might not be needed immediately but eventually
would be used downstream. To continue our example, once your online order has
been placed, the items can’t be delivered to you if you did not provide a
shipping address, and that means that the shipping address data is required (and
must be complete) when the order transaction takes place.

From the analytical perspective, we also have data completeness expectations,
and they become relatively pertinent for aggregation and roll-ups. Consider a
report that combines measures for total sales and for average sales, but some of
the records are missing sales amounts. Both the total amounts and the averages
are going to be inaccurate as a result of missing values.

In both usage scenarios, missing data is an issue, and our next set of entries
will examine missing data in more detail.