Cross-Table Integrity

Blog Administrator | Uncategorized | , , , ,

By David Loshin

One of the most challenging data flaws that appear in relational database systems is the absence of referential integrity across different tables. For example, consider a transaction processing system in which one table captures the quantity and total costs for purchased items, and each record refers to a product reference code that can be looked up in a product master table.
Read More

Data Quality Assessment: Value and Pattern Frequency

Blog Administrator | Analyzing Data, Analyzing Data Quality, Data Profiling, Data Quality, Data Quality Assessment | , , , , , ,

By David Loshin

Once we have started our data quality assessment process by performing column value analysis, we can reach out beyond the scope of the types of null value analysis we discussed in the previous blog post. Since our column analysis effectively tallies the number of each value that appears in the column, we can use this frequency distribution of values to identify additional potential data flaws by considering a number of different aspects of value frequency (as well as lexicographic ordering), including:

  • Range Analysis, which looks at the values, and allows the analyst to consider whether they can be ordered so as to determine whether the values are constrained within a well-defined range.
Read More

Standardizing Your Approach to Monitoring the Quality of Data

Blog Administrator | Address Standardization, Analyzing Data, Data Cleansing, Data Integration, Data Management, Data Profiling, Data Quality | , , , , , , ,

By David Loshin

In my last post, I suggested three techniques for maturing your organizational approach to data quality management. The first recommendation was defining processes for evaluating errors when they are identified. These types of processes actually involve a few key techniques:

1) An approach to specifying data validity rules that can be
used to determine whether a data instance or record has an error.

Read More