The Need and the Mechanics of Address Standardization
By David Loshin
conjectured that the existence of a standard for addresses not only simplifies
the processes of delivery, it helps to ensure delivery accuracy. Ultimately,
delivery accuracy saves money, since it reduces the amount of effort to find the
location and it eliminates rework and extra costs of failed delivery.
This is all well and good as long as you use the standard. The problem occurs
when, for some reason, the address does not conform to the standard. If the
address is slightly malformed (e.g. it is missing a postal code), the chances
are still good that the location can be identified. If the address has serious
problems (e.g. the street number is missing, there is no street, the postal code
is inconsistent with the city and state, or other components are missing),
resolving the location becomes much more difficult (and therefore, costly).
There are two ways to try to deal with this problem. The first is to bite the
bullet and treat each non-standard address as an exception, forcing the delivery
agent to deal with it. The other approach attempts to fix the problem earlier in
the process by trying to transform a non-standard address into one that conforms
to the standard.
Address standardization is actually not that difficult, especially when you have
access to a good standard. At the highest level, the process is to first
determine where the address does not conform to the standard, then to
standardize the parts that did not conform.
If you recall from my previous post, an address captures the incremental
knowledge to resolve the location, and we can use this fact plus the information
provided in the standard to consider ways to fix non-standard addresses. Each
component has its specific place inside the address, and there are standards for
abbreviations (such as ST for “street,” or AVE for “avenue”) as well as for
common terms (such as ATTN for “attention”).
One can define a set of rules to check if the address has all the right pieces,
if they are in the right place, and if they use the officially-sanctioned
abbreviations. You can also use rules to move parts around, to map commonly-used
terms to the standard ones, and use lookup tables to fill in the blanks when
data is missing. So in many cases, it is straightforward to rely on tools and
methods to automatically transform non-standard addresses into standardized