By Elliot King
Data profiling is the process for identifying what records are “broken.” It
consists of comparing your actual data to what you think you should have.
Since data flows into organizations via so many routes, errors are
inevitable. But if you never look for them, you won’t know the data is
flawed until something unexpected–usually unexpectedly bad–happens.
Once you know what’s wrong, you can set about fixing it. But like a house that
is in disrepair, you don’t have to do everything at once. You may not want to
correct some errors at all if they do not have a significant impact. Some
mistakes in the data may be so fundamental that you simply cannot risk using it
at all. Sometimes, a record may be incomplete but adding a placeholder–a
standard substitute value–may be enough. And all the other errors you find,
well, you will probably want to fix them.
The next two elements of data quality improvement programs go beyond finding and
fixing what can be fixed. Many organizations have a boatload of redundant data–a
single customer’s name and address may be stored in numerous different
databases. Those records should be consolidated. The more places data is stored,
the greater the odds that inconsistencies will be introduced and inconsistencies
inevitably lead to errors.
Finally, the data you have may not be sufficient to address your business needs.
Good data quality improvement programs take steps to augment the existing
corporate information. The more good information you have, the more value you
can develop from it.