provides numerous APIs to handle the standardization, cleansing and
verification of various data elements such as addresses, names, phones and
emails. Often times we get asked by our customers as to what are some ways to
improve the speed when processing records using our APIs. Below are some
performance tips to consider when implementing our APIs:
type of medium where the data files are stored may have an impact on processing
speed. We recommend the use of solid state drives to store our data files as
they have faster seek & read times compared to spinning disk drives.
have also sometimes seen clients store the data files on a network share which
we definitely don’t recommend. Typically, trying to access the files over a
network introduces latency which impacts the processing speed when blocks of
data need to be fetched quickly. Data files, therefore, should often be stored
locally on the machine.
amount of memory available and speed can also affect processing. While using
the APIs, once data is read from the hard drive, the data is cached and stored
temporarily into memory in case the data needs to be accessed again shortly.
simple example below, we have a list of phone numbers that were verified using
our Phone Object API along with the times in milliseconds indicating the amount
of time to verify the phone number.
Object API encounters a phone number in a new area code, there are spikes in the verify
times as the Phone Object now has to go back and fetch a new block from the
data files stored on a hard disk and cache it into memory. The more memory
available on the system, the more that can be cached into memory as the API
reads more blocks from the data files on the disk. And, as discussed in the previous
section, having a faster hard drive will help keep those disk read times low
when those data file reads occur.
with more than a single core are now common these days. If multiple cores are
available on the system, we highly recommend that developers take advantage of
that. When multithreading with our APIs, we suggest having each thread contain
its own object instantiation for our APIs.
For example, if you have 8 cores,
you may want to create 8 threads with our API instantiated 8 times: once per
each thread. Ideally, you would want to create a pool of threads that have our
API instantiated already, and therefore ready for processing. If you keep
reinitializing/instantiating our API, that will introduce some overhead in the
graph and table below shows some multithreaded testing of our own with our Global
Address Object® API with UK addresses and, as shown, there as
substantial speed increases that can be obtained through multithreading: