Distributed Data and Distributed Information

Blog Administrator | Analyzing Data | , , , ,

By David Loshin

You might not realize how broad your electronic footprint really is. Do you have any idea how many data sets contain information about and specific individual? These days, any interaction you have with any organization is likely to be documented electronically. And, for those curious enough to read the fine print of the “privacy” policies, you might not be

surprised to find that many of those organizations managing information about you are sharing that information with others.

Actually, this is not a new phenomenon; this has been going on for many years by
data aggregator companies who just love to collect data turn it into salable
products. The easiest example to share is that of the mailing list company with
the reference database that can be segmented across numerous geographic
dimensions (in incremental precision such as state, county, town, ZIP code,
ZIP+4, street name, etc.) as well as demographic dimensions such as number of
cars owned, favorite leisure activities, or household income.

And any time you fill out some form or respond to some survey or another, more
information about you is captured. Remember that registration card you filled
out for the toaster you bought? The survey you filled out to get that free
subscription? Didn’t you subscribe to some magazine about fishing and other
outdoors activities? How about that contest you entered at the county fair?

Actually, you are not the only one collecting your information. Did you buy a
house? Home sales are reported to the state and the data is made available,
including address, sales price, and often the amount of your mortgage. Wedding
announcements, birth announcements, obituaries log life cycle events.

Every single one of these artifacts captures more than just some information
about an individual – it also captures the time and place where that information
is captured, sometimes with accurate precision (such as the time of an online
order) or less precision (such as the day the contest entry was collected from
the box.)

There are many distributed sources of information about customers, and each
individual piece of collected data holds a little bit of value. But when these
distributed pieces of data are merged together, they can be used to reconstruct
an incredibly insightful profile of the customer. How does this work? More in
the next set of posts.