Leveraging SSIS in Data Governance, Data Quality and Data Profiling
First and foremost, we have set out at Melissa Data to provide a complete set of SSIS tools required to perform Data Quality as defined by Gartner. They are as follows:
2. Generalized Cleansing
3. Parsing & Standardization
In addition, we intend to demonstrate how these tools can be further used to provide a foundation for using the Microsoft SQL Server environment to support Data Governance.
Lastly, by leveraging the SSIS Development Platform and our existing capabilities, we can provide these capabilities cheaper, faster and better than any of our competitors.
The Data Governance Pyramid
The pyramid below illustrates that on the road to Data Governance you need a foundation of Data Quality Tools and a Dynamic Metadata Repository. Our
goal is to not only support the tactical needs for Data Quality, but the
strategic needs for Data Governance as well as Master Data Management. We are
uniquely qualified to accomplish this based on our experience in address
correction, proprietary libraries and other enrichment capabilities.
we look at the foundation of capabilities required for Data Governance and/or
Data Quality we see the six capabilities defined by Gartner.
Profiling is required in some form for all Data Quality, Data Governance, MDM
or Data Integration projects. As illustrated above, Data Profiling is the
underpinning of all successful projects. By providing expanded and fully
capable Data Profiling SSIS Tools, we provide all building blocks required to
support any Data Governance, Data Quality or Data Integration projects.
will come back to this point in regards to our marketing strategy later.
simply leveraging the Microsoft SSIS Development environment, which Microsoft
intended in their design and open access to the internal programming object
model we have extended the SSIS Platform exponentially.
in this article we will discuss the internals required for Data Profiling and
how by leveraging the SSIS environment you can concentrate our programming
resource on the core BUSINESS REQUIREMENTS functionality required for these
areas and not deal with the technical integrations issues already covered by
a developer or user will try to process a data set and find some type of
anomalies. Data type mismatch, format
mismatch, and incorrect values. The ways the developer attempts to understand
the various problems usually involve manual coding of queries to verify and
analyze the source dataset and to develop code to fix them.
larger projects, this would have been accomplished as a part of a formal source
analysis by a skilled Data Analyst; however the vast majority of project involving
SSIS are small or understaffed and underestimated. The power of Microsoft’s ETL
Tool (SSIS) is often overestimated, it still lacks many required features of
mature or Open source ETL/Data Quality
Tool (Informatica, Data Flux ,Talend, Data Integrator and Datastage. We now
provide those tools.
real appeal is that the tool (SSIS) is viewed as “Free” and this is exactly
what the Microsoft Marketing plan was starting in 2005 with the introduction of
Microsoft SQL Server 2005 SSIS. Secondly,
while other “Free” open source tools are available, they require additional
training , while companies that have Microsoft SQL Server already have readily
available resources and a large “community” of willing teachers and free
samples of how to resources.
of customers are turning to SSIS for the data integration and ETL needs. And,
most are finding some gaps in functionality. Here at Melissa Data, our plan has
been to fill these gaps. Customers are turning away from third party tools in
droves, and since Melissa Data’s Data Quality Components for SSIS (DQC) are
integrated into SSIS, they are seen as add ons, not separate tools requiring
separate training and skills.
are showcasing our Data Quality Tools and Data Profiling tools at a series of
Education Events sponsored by PASS and supported by Microsoft. Ira Whiteside
has been invited to speak at most major events and will be presenting the
National SQL Rally in Orlando
in May on the topic of “Creating a Metadata Mart on the Road to Data
Governance.” We also have scheduled a series of webinars. For
accelerated Market Penetration, we are currently working on a series of
Community Edition is not a demo, it is a limited function version that can be
used in production. In order to participate in the Community Edition campaign,
please send an email to Ira@MelissaData.com
with “Community Edition” in the subject line, and your full name and company in
SSIS Stock and
other Data Profiler vs. Melissa Data Profiler
we examine the SSIS Data Profiler Task, we can see that it has the basic
feature you would expect, and how it lacks the ability to read any formats
except SQL Server 2005/2008, only targeting XML. These are severe imitations.
Melissa Data’s Data Quality Components for SSIS (DQC) introduced in 2009
provide most of the capabilities of the existing SSIS Data Profiler, however
they have been implemented as Data Flow Components so they can source any
platform and target repository on any platform. This is a huge advancement in
SSIS capabilities and a major competitive advantage.
can see from the comparison above that with the Melissa Data Profiling SSIS
Components, the SSIS Platform is able to compete and exceed most competitors’
of the major obstacles most vendors face in providing data profiling tools are:
1. 1. Speed required reading
millions of rows and the code base required to support the creation and
maintenance of a DMR (Dynamic Metadata Repertory). In our implementation we
have leveraged the existing SSIS Pipeline capabilities to provide a High speed
and scalable Data Profiling set of tools.
2. 2. Dynamic Metadata Repository
(DMR) supporting history: Here we use existing SSIS capabilities to allow the
user to load and maintain a data driven custom historical on any platform.
3. 3. Reporting/Visualize: Here
again we rely on the Microsoft SQL Server Reporting Services and Analysis Services
as well as Microsoft Office Tools (Excel, PowerPoint) connected to the
4. 4. Multiplatform data
profiling is supported with our Melissa Data Profiling components based on the
use of the SSIS Tool and the “Pipeline.” As long as the client has a single
license for SQL Server 2005/2005 they can use our Data Profiling Tools to read
any source they have API/Drivers for and write the repository to any platform.
Profiling is critical to all integration or data quality efforts. The need is
pervasive and ever present. We plan to use feedback on the Community Edition to
help guide our future development.