By Elliot King
And, over time, big data may prove worthy of the hype. But in the meantime, big data promises to overturn the traditional assumptions of data quality. In the most common work flow, data analysis starts with a problem that people want to solve.
They then identify the data needed to solve that problem. Data quality represents the processes used to ensure that the data used is good enough to meet the intended purpose, with the key criteria being those old standbys like validity, timeliness, completeness, and so on.
As it stands now, big data starts at the end. People are looking around and see that they have access to social media data like Twitter messages. They have Web site traffic numbers. They have loads of video and audio being digitally produced. And then, of course, there is sensor data all around us.
Seeing that data basically exists everywhere, organizations are trying to determine what they might do with it. In other words, they have data in search of a problem rather than a problem in search of the data that could solve it.
As a result, while many, or perhaps most, of the principles of traditional data quality still apply, some principles may now apply more than others. Perhaps the most important is developing effective metadata. Big data is complex. It usually comes from multiple sources.
It may be used across the enterprise and there is a lot of information involved. Accurate and appropriate metadata is critical for the efficient use and interpretation of big data. Data integration, data element classification and data standards are also fundamental to effectively use big data.
Big data is truly something new. And while the established principles of data quality are still relevant, their application must be rethought.