The Many Dimensions of Data Quality
Melissa IN Team | |
When should you introduce a new product line?
How much of your advertising budget should you earmark for social media?
Should you renew your contract with a vendor?
No organization keeps track of the number of decisions they take in a day. What’s common between 99% of these decisions is that they’re backed by data. Data is easy to collect but simply having a ton of data is no guarantee for good decisions. If the data quality is good, your decisions should be profitable but if the organization relies on poor quality data, the decision may prove expensive. So, what makes data good or bad?
6 Core Dimensions of Data Quality
Data can be judged according to innumerable measures. But, 6 core dimensions are commonly used to compare data quality. These are:
The first dimension of data quality checks whether it contains all the essential information. Data can be considered complete even if it is missing certain parameters if these parameters are considered optional.
For example, when entering their name in a lead generation form, some customers may enter their first and last name, some may enter only their first name and some may enter their first, middle and last name. The middle name is usually considered optional while first and last names are mandatory. Thus, in the first and third cases, the records would be considered complete but in the second case, it would be considered incomplete.
Data is considered valid if it conforms to a format or syntax. Reference rules could include checking entries against a minimum-maximum range, length of entry and permitted entries.
Let’s take the address field for example. The pin code in India has 6 digits. Thus, an entry with only 5 digits would be considered invalid. Similarly, say customer IDs must have two alphabetical values followed by 4 numerical values. In this case, AB0050 would be considered valid while AB005 would be invalid.
Accuracy refers to whether or not the data reflects the ‘real world’ values. Accuracy can be judged by common sense as well as by referencing the data against 3rd party date sets.
For example, if a person enters their name as Mickey Mouse, you can safely assume that the name is fake. However, to check the accuracy of a name like Yogesh Sharma one would have to check the data against 3rd party records like driving license records, government ID cards, etc.
Accuracy and validity are closely related since for data to be accurate, the values must be valid and in the correct syntax.
For example, let’s say the accepted format for dates is MM/DD/YYYY and a person’s birthdate is April 9th, 1985. If the person enters the date in the DD/MM/YYYY format, it would be 09/04/1985. However, it would be read as the 4th of September, 1985. This data may not match reference data simply because of wrong formatting and hence would be considered inaccurate.
In many organizations, data is stored in multiple places. To be considered consistent, data values across different records must match each other.
For example, the HR team and the Payroll team may both have independent employee lists. These lists must match each other. Let’s say an employee leaves the organization and his/ her name is removed from the HR list but not from the payroll list. In this case, the data would be considered inconsistent.
Uniqueness implies that an entity should exist only once in a record and should be accessed by a single unique key. For a data set to be considered good quality, it should not contain any duplicate records.
When data is collected through multiple modes, there is a chance for duplication. The salesperson may enter a customer’s name as R. Smith while the customer care representative on the phone may enter his name as Robert Smith. This means that the organization will have two records for the same person.
For data to be considered good quality, it must be up to date. This dimension of data quality is particularly important for contact fields such as an address, email and phone number. To check the timeliness of data it must be regularly verified. This is because the data may be correct when the record was first created but may have changed since then.
Other Data Quality Dimensions
In addition to the 6 core dimensions, data quality can also be measured against other factors. This can be important when determining the usability of the data. For example, data may be accurate, valid, up to date, consistent, unique and complete but may need to be accessed by a person who speaks only Dutch. If this data were in English, it would serve no purpose since the user would be unable to understand it.
Quality can be assessed at various levels and data that is considered good quality according to its completeness at one level may be considered poor quality at another. For example, a customer record with the customer’s name, phone number and address that meets all the above factors may be considered good quality data for an organization’s product delivery team.
However, this record may be considered poor quality by the marketing team since it does not contain the customer’s age and gender.
Improving Your Data Quality
Assessing data quality is not a one-time exercise. This must an ongoing process to ensure that data is considered good according to all the above dimensions. At the same time, organizations must also ensure that this process does not inconvenience their customers.
For example, asking customers to verify the email address by clicking a link contained in an email every month may be tiresome. Instead, it is better to partner with an agency that specializes in global identity verification and data quality solutions.
These solutions don’t just check existing data against quality standards, they also enhance data by correcting typographic errors and completing fields. All of this without any extra effort from your team or your customers!