Techniques for Address Standardization
By David Loshin
But knowing that with some degree of precision we can map locations to their nearest geocoded location, let’s think about aspects of a more general challenge: ensuring resolution of a provided descriptive address to an actual known address.
Let me clarify this a little. When I talk about a “provided descriptive address” I am referring to what an individual has presented as an address. And while another individual might be able to infer enough meaning from a presented address to make a delivery, the address might have misspellings, errors, or other variations that might prevent it from being adequately mapped to a specific geocoded location.
Aside from the other benefits we have already considered, transforming the address into a standard form will simplify the geocoding process. That transformation process leverages a few straightforward ideas, namely:
- There is a representative model for “standardized” addresses with its accompanying formats, syntax, acceptable value lists, and rules.
- An application is able to scan a non-standardized (or what I called a “provided descriptive”) address, differentiate between the parts that are good and the ones that do not meet the standard.
- There is a way to map the non-standard parts into standard ones.
In fact, all three of these ideas are doable, and over the next set of postings let’s look at each one of these in greater detail.