Bad data is a challenge for all organizations. Without data that is reliable, accurate and updated, organizations can’t deliver trusted customer, product and other vital data throughout the enterprise – resulting in waste and operational inefficiencies, less effective sales and marketing, poor customer service and potentially bad business decisions.
That’s why many companies choose to incorporate data quality directly into their data integration architecture. And to be successful, companies should opt for a program that is flexible and agile to meet evolving business requirements.
Microsoft SQL Server provides such a solution – SQL Server Integration Services (SSIS). This service offers a scalable enterprise data integration platform with exceptional Extract, Transform, Load (ETL) and integration capabilities – enabling organizations to more easily manage data from a wide array of sources, including XML data files, flat files and relational data sources.
And, increasingly, more third-party connectors are becoming available which helps further extend the power and capabilities of SSIS, including the ability to cleanse, standardize, dedupe and enrich data seamlessly within the pipeline.
SSIS also is a cost-effective way to enable these data cleansing and enrichment tasks – as it does not require the use of monolithic applications that would exert a lot of time, money and man-power to install and maintain.
Here are some examples of new packages available for SSIS:
- Validation and Cleansing
This package is a collection of transforms that can be integrated into a workflow to clean, validate and update all kinds of contact data including addresses, phone numbers, email addresses, etc.
- Parsing, Standardization and Enrichment
Use this package to parse and restructure data into a common format to assist in maintaining consistent data. This package can standardize addresses to a desired format, or to USPS® specifications, which is required for certain postal discounts. This package also includes geocoding, which enriches the value of data by appending geographical information that can help improve everything from marketing to logistics. Geocoding adds latitude/longitude coordinates to a postal address, as well as the Census tract and block which can be used to find additional demographic information. These, and other packages, can be easily dropped into SSIS as native components and plugged into workflows to cleanse and enrich data during ETL processes.
The deduplication of data from disparate sources is an ongoing challenge for organizations of all sizes. This package allows users to compare any data type of any length and from any place in a field to find duplicates fast. The program contains 12 “fuzzy” matching algorithms to find even the not so obvious duplicates – for example, ‘Beth Smith’ at UDM and ‘Smithe, Elizabeth’ at United Data Machines. Detecting the most duplicate records helps streamline databases, improve marketing efficiency and achieve a unified accurate view of the customer.
Profiling helps you develop a clearer picture of your data. This package includes several profiling tasks, including column statistics, value distribution and pattern distribution. These tasks analyze individual and multiple columns to determine relationships between columns and tables.
Monitoring Ongoing Data Quality
Databases that contain contact data are never static, because information is changing all the time as customers move, change companies, die or divorce. And new or possibly inconsistent data is coming in all the time from call centers, Web forms and data entry by various departments. That’s why it is important to add in a monitoring package in SSIS – to provide real-time data monitoring in an automated process to immediately recognize and correct issues before the quality of data declines. This approach also helps organizations enforce data governance and compliance measures.
Taking advantage of affordable and easy-to-integrate third-party data quality and enrichment packages can help organizations extend the capabilities of their investment in SSIS, helping expand the meaning between data sets, consolidate and validate contact data, and better synchronize business processes through a single view of master data.