The Format of Nothing
Blog Administrator | Analyzing Data, Analyzing Data Quality, Data Governance, Data Management, Data Profiling, Data Quality |
By David Loshin
Later systems allowed for different types of data values to be used as null
values. For example, in one data profiling exercise I worked on, a telephone
number field had the character string “No phone number provided” when the value
was not available. But the most interesting cases of representations of missing
values occurs when the application itself enforces a mandatory population of a
data field for which no value may be available.
In some cases, the data entry person can’t progress past a certain screen until
a set of mandatory fields are completed, and when no values are available, the
data entry person is forced to enter something, leading to some very creative
(and some not so creative) representations of missing data, such as “Not
Available,” “NA,” “N/A,” “None,” “none,” “no name,” as well as multiple versions
of question marks and other random character strings. In different data sets,
the variety of values is proportional to the number of people entering data,
especially when there is no data governance or standards.
So we have an interesting range of potential representations for null values,
ranging from well-formatted character strings to somewhat haphazardly created
data values. Yet, as we will see next week, the question of the formatting and
representation of null or missing values is less interesting than the meaning of
missing data…