The Meaning of Nothing

Blog Administrator | Address Quality, Analyzing Data, Analyzing Data Quality, Data Management, Data Quality | , , ,

By David Loshin

What does it mean when a data element has a null value? In my previous posts, I sort of suggested that the data value was “not available” but that is a bit presumptive. At earlier stages in my data career, I spent a lot of time thinking about the meaning of a null value, and considering the reasons that a data element’s value was missing. Here are some ideas I have proposed:

• The data element does have a value in real life, but the person entering the data does not know the value. As an example, the application might ask for a customer’s fax number, but person entering the data did not know what the fax number was.

• The person entering the data does not know if the data element really has a value. This might be an alternate to the previous item, in which the person was asked for a customer’s fax number and did not know whether the customer had a fax number.

• The data element really does not have a value, such as if the customer really did not have a fax number.

• The data element does not yet have a value. This occurs a lot in practice, such as when a record is created to allow a new customer to purchase a product and the order is put through before the customer has been assigned a customer ID.

• The choices for entering a value were not valid. I was alerted to a scenario in which a Canadian person was not able to do an online order for downloading software because the system required the customer to enter a 5-digit postal code, but Canadian postal codes are 6 characters long.

• The data element’s value was incorrect and had to be removed until the correct value was found.

The more I thought about the use cases, it occurred to me that the absence of a value is much more perplexing than an incorrect value or an untimely value. In those sort of cases, the knowledge of the flaw is basically known, and that can help is mitigation.

But for incomplete data, the data models usually don’t have a way to capture the reason for the missing value as part of the record. So while incompleteness is one of the easier types of issues to identify, it is also difficult to resolve.