4 SaaS Data Integration Challenges to Keep an Eye Out For

One of the hottest areas of the software market during the past few years has been the delivery of new software products through SaaS offerings. The SaaS model can take many different forms and views, from a hosted application that end users log into and use via their Web browsers, to specific, discrete application processing services and technologies that are…

Continue Reading

Record Linkage & Fuzzy Matching Part 2a (More on “Blocking” for Performance Improvement)

Over at the LinkedIn Group run by Henrik Liliendahl Sorensen for Data Matching, Bill Winkler, principal researcher at the us census bureau has shared several reference papers on "blocking." They are excellent and I wanted to share them with you.   According to Winkler "The following three papers are primarily concerned with 'blocking.' The third gives a methodology for estimating…

Continue Reading

Record Linkage and Fuzzy Matching Part 2

This blog series will address overall the steps necessary for efficient data/record processing that include a record linkage or fuzzy matching step.  In part 1, we covered the overall approach.   Today, we will cover the following steps:   1. Categorize 2. Split records   They are defined in academia as creating a "Blocking Index." (We will cover cleansing next;…

Continue Reading

Survey: Most Common BI Problem…Can You Guess What It Is?

Recent survey results uncovered by U.K.-based Business Application Research Center (BARC) surprised even its researchers! This year, for the first time time, the biggest company complaint on Business Intelligence (BI) issues wasn't slow query performance, company politics, or even a lack of end-user skills. Find out what an overwhelming majority of companies say is their biggest BI obstacle. www.melissadata.com/enews/articles/10212010/2.htm  …

Continue Reading

Record Linkage and Fuzzy Matching Part 1

This will be the first in a series of posts identifying similar records between two different sources or grouping of records from a single source, based on existing column string of values. We will define an approach, review actual implementations with various tools and vendor's products. There are many facets to review. I would like to start by drawing from…

Continue Reading