Data Quality Tips from Our Experts
DQT Assistant Manager
Increasing the Speed of Melissa Data Libraries
There are several architectural/language/optimization changes that can speed up processing. They are listed here in order with suggestions most likely to increase speed at the top. Of course, these are not the only measures one can take to increase speed but from our experience, these are the most effective.
1. Make sure you are using 1 object per batch process.
You only need to initialize one instance of an object to process a batch list. Make sure that you are not creating a new instance and re-initializing for every record unless you absolutely have to.
2. Move to a more optimized programming language.
The fastest language to use is C++ because our components are written in C++. However, any modern Object Oriented language will provide fairly similar speeds. The main exception is T-SQL. SQL Server was not designed to use third party components as fast, and will process contact data anywhere from three to 10 times slower. If your data is stored in SQL, we recommend you invoke and use our components in another programming language (like C#, VB.NET, C++, Java) and create a connection to the SQL Server to retrieve and store the data. Additionally, sometimes the time taken for selects, inserts, and/or updates gets confused for processing time of our components when they alone account for a significant portion of the overall time.
3. Order the data by ZIP Code™.
The source data we used to verify and look up addresses information is stored in ZIP Code order. So, if your data is also in ZIP Code order, there will be less data being moved in and out of cache, speeding up processing time.
4. Increase memory (RAM) or improve other hardware.
If you have 1 GB of memory or less, increasing to 2 or 4 GB can significantly increase processing speed. This is the easiest and most effective way to increase speed from hardware. Another hardware upgrade options are is to use a faster hard drive (SCSI or solid state).
Our components are thread safe and can have multiple instances running in multiple threads. Having additional threads will allow you to take full advantage of CPU time as well as multiple cores. Adding threads will increase processing (up to a certain point) but each additional thread will provide diminishing value. We recommend 2-3 threads per core.
Disclaimer: Do not use multi-threading until you are comfortable and experienced with it. Data access of our components are thread safe but you must maintain thread safety for accessing our libraries. Rule of thumb: one thread per instance.
6. Cut out COM Interop.
Our windows components are available in two flavors, COM and standard dll. They both have the same core verification engine but the COM version has a COM Interop layer to facilitate communication with many popular programming languages. If you are experienced with COM vs non-COM, you may look into using the standard dll to remove the COM Interop layer and reduce the amount of data marshalling. For .NET users, both the COM (samples directory) and the standard dll (interfaces/NET directory) sample codes are available.