By Elliot King
And it means more sources of data–sensors, social media, the Web and so on. And it means more potential for data to have a big impact on an organization economically or otherwise. At times, it seems like figuring out how to capitalize on social media data is the new holy grail of business intelligence.
But big data also means big challenges for data quality. While a lot of the issues are pretty clear, many of the answers aren’t. For starters, in many cases, big data defies the traditional definition of data quality. Typically, data quality is determined by its intended use. Is the data good enough to allow the associated business process to function efficiently and as expected?
But in many cases, people are still trying to understand – what is the best way to gain value from big data. Moreover, the first use of a big data repository will probably not be the last use.
And the first team that uses the data in an enterprise may not be the last. So the definition of data quality for big data has to take into account its suitability for reuse as well as the purposes initially intended. Also, the understanding that people with different problems to solve may want access to the data and this realization should be factored in.
And it doesn’t get any easier from there. For semi-structured and unstructured data, data quality attributes and artifacts may be tricky to create. Those data quality attributes have to be incorporated into the metadata about the object. But did I mention that although metadata for the data container may exist, metadata about the content for some big data types might not. Are you starting to feel like you are pulling on a piece of yarn and the ball just keeps unraveling with no end in sight?
Big data requires people to rethink data quality issues. The old rules still apply but they have to be applied differently.