The Role of Data Profiling in Data Quality Assessment
By Elliot King
“sustainability,” perhaps the biggest buzzword flying around many corners of the
corporate world these days is assessment. It seems people can’t breathe without
somebody wanted to assess the quality of the air, the efficiency of their lungs,
and, of course, the outcome of the breath.
But just because something is a buzzword, doesn’t mean it is a bad thing. So how
do you go about assessing the quality of your data? With a tip of the hat to the
idea that you have to know your starting point before you can map a path to a
finishing line, the first step in many data quality programs is data profiling.
Without getting too technical, the data profiling process generates and collects
descriptive statistics describing data such as minimum, maximum, mean, mode,
percentile, standard deviation, frequency, and variation as well as aggregate
statistics such as sum and count. These statistics are analyzed to reveal and
validate data patterns and formats, uncover duplicate data from different data
sources, identify missing data and confirm that data values are valid.
Data profiling describes data in a way in which the data’s strengths and
weaknesses become apparent and the accuracy and completeness of the data can be
determined. Based on that assessment, remedial data quality improvement programs
can be launched.
Moreover, during the last couple of years, data profiling as gotten a lot of
attention for the role it can play in master data management programs designed
to ensure the consistency of key non-transactional reference data used across
In the long run, data profiling can be used both tactically and strategically.
Tactically, it can serve as an integral part of data improvement programs.
Strategically, it can help managers determine the appropriateness of different
data source systems under consideration for deployment in a particular project.