What is Data Profiling and Why Profile Your Data?

Melissa IN Team | Data Profiling | , , ,

Many times the presumptions regarding the data that we store and provide are not always precise. Despite taking all necessary precautions our systems are not always totally free of bugs. As a result, the quality of data provided gets compromised and this could lead to several negative outcomes.

So, what can be done to prevent such situations? You need to profile your data.

Data Profiling

Data profiling refers to the procedure where the data sources get evaluated for their structure and quality to be sure of the accuracy of your data.

• Your data gets evaluated by comparing it to an existing data source.
• This can help you arrive at the right conclusion on its accuracy.
• Profiling your data helps in determining its completeness, precision, and validity.

Data profiling is done by combining it with the Extract, Transform, and Load process (ETL Process) most of the time. This helps in moving data from one location to another. Combining ETL and Data profiling helps to cleanse the data, fix the issues, and move quality data to the desired location. Profiling your data helps to identify the quality issues that require correction and the particular issues that can be fixed during the ETL process.

Why Is Data Profiling Important?

Using compromised data puts your entire project at risk. The problems and challenges that are faced by projects of data integration are similar to the ones faced by the IT industry. They include:

• Compromising quality to meet deadlines
• Lack of time
• Budget overrun
• Incorrect and insufficient understanding of the data source

These challenges and problems could be the result of certain issues including the following

• The difficulty in unwinding data due to its huge volumes
• The complexity of databases and applications
• The process is challenging and time-consuming
• This is also subject to errors

The quality, structure, and content of data need to be understood before getting it integrated or used in an application.

To understand the preciseness and quality of data most of the initiatives for data integration depend on external sources of information. This includes relying on the experience of staff, depending on source programs and documentation.

The external information could be wrong, outdated, or incomplete most of the time. This means you’ll have to put in more time, effort, and money to get these issues fixed and validate your data. You’ll be compromising the entire project in case you fail to do it.

Data Profiling is necessary for the following.

• To understand the data
• To organize it
• To compare and verify if your data matches with its source
• To ensure that the data match the statistical measure standards
• To make sure data is per the company’s business rules and regulations

Proper data profiling helps you to answer the following questions.

• Do you have the required data?
• Will that data be sufficient to complete your project in time?
• Is your data complete? or are there any blank values?
• How unique is your data?
• Does it support the requirements of your company?
• Does it accurately represent the needs of your organization?
• Is it possible to integrate, cross-refer, or consolidate the data for usability?
• What data requires cleaning?
• Has there been a duplication of data?
• Are the data patterns anomalous?
• What data requires transformation?
• Can you be sure of its correctness and consistency?

Being able to answer these questions correctly will ensure the quality of your data which is necessary for the overall growth and success of your business.

Data Profiling: The Different Techniques

In general data, profiling is done using 3 different techniques. They are the following.

1. Column Profiling Technique

Using this technique of profiling the number of times each value appears within each of the columns in the table is counted. This technique helps to discover the patterns in your data as well as to understand the frequency distribution.

2. Cross-Column Profiling Technique

There are two different processes under this technique of data profiling. They are:

• Key Analysis
• Dependency Analysis

Key Analysis is a process where a group of values within a table is scanned to trace out a prospective primary key.
Dependency analysis is carried out to identify the structures built/dependent relationships within the data set. Compared to Key Analysis the process of Dependency Analysis is more complex.
Both these techniques are used to identify dependencies and relationships among the attributes of data within a table.

3. Cross-Table Profiling Technique

This technique of profiling searches the entire table to identify possible foreign keys. This technique also helps to identify the differences and similarities in data and syntax between the tables. This will help in removing data redundancy and in locating data sets that can be charted together.
There is an additional step which is often considered as the final step in profiling data- Data Rule Validation. This proactive method verifies to understand the authenticity and accuracy of the data entered using a set of predefined rules.

The above-mentioned techniques of data profiling may be carried out using automated services or can be done by an analyst manually.

The data profiling process helps to verify whether the rows in the table are filled with accurate and valid data as well as to understand its quality. Once a problem is detected you need to get it fixed by mentioning the steps in your project for data quality. Data profiling helps in governing your data properly.

Melissa Automates Clean Data Processes with Intuitive Customer Data Validation Platform

Melissa Team | 2020, Address Check, Address Correction, Address Standardization, Address Verification, Data Matching, Email Verification, Geocoding, Press Release, Unison

Unison reduces strain on SQL developers by empowering data stewards to manage operations – no programming expertise required

Rancho Santa Margarita, CALIF – November 4, 2020 – Melissa, a leading provider of global data quality and address management solutions, today announced its Unison customer data verification platform as an easy-to-deploy solution for data stewards to maintain impeccable customer data. Unison brings together all of Melissa’s data quality API capabilities including global address, name, phone, email verification, geocoding, and data matching, into a flagship UI that is fast, scalable, and requires no programming. Advanced, proprietary fuzzy matching algorithms with golden record/survivorship rules are built in and controllable through a simple, intuitive interface.

When dealing with customer data, SQL Server developers face common issues such as unwieldy packages, system updates, extensive vendor options, and version control which exacerbate already complex automation and rapidly evolving processes,” said Bud Walker, vice president, enterprise sales and strategy, Melissa. “With Unison, data stewards are instead directly empowered to validate, cleanse, import, and export data via an intuitive, project-oriented framework, freeing their developer colleagues to tackle higher value tasks.

Unison offers flexibility through a data-agnostic technology stack, connecting disparate data streams to quickly verify, enrich, and unify customer profiles with accurate, up-to-date data. The platform is scalable across limitless nodes, with flexible collaboration, scheduling, and rights management. Its architecture capitalizes on existing computer assets, disseminating customer data quality jobs across the enterprise and harnessing the processing power required to quickly render contact data clean and reliable. Scalable to accommodate huge datasets and multiple secure users, Unison also offers visual analytics, detailed logging, and audit trails for stewards and stakeholders alike.

Gain more insight on Unison with a product demo or via Pass Virtual Summit 2020, November 10-13. To connect with members of Melissa’s global intelligence team, visit www.Melissa.com or call 1-800-MELISSA.

About Melissa
Since 1985, Melissa has specialized in global intelligence solutions to help organizations unlock accurate data for a more compelling customer view. More than 10,000 clients worldwide in arenas such as retail, education, healthcare, insurance, finance, and government, rely on Melissa for full spectrum data quality and ID verification software, including data matching, validation, and enhancement services to gain critical insight and drive meaningful customer relationships. For more information or free product trials, visit www.Melissa.com or call 1-800-MELISSA (635-4772).

Media contacts
Greg Brown
Vice President, Global Marketing, Melissa
+1-800-635-4772 x1130

Jacqueline Zerbst
MPowered PR for Melissa

How to Maximise Trading During the Festive Season

Melissa UK Team | 2020, Data Audit, Data Cleansing, Data Quality, Ecommerce

The festive season is vastly creeping upon us, with Black Friday around the corner, retailers must turn their throughs to what will be their most profitable period – quarter 4.
In normal circumstances during Black Friday and even Cyber Monday, we would see a barrage of sale hungry consumers rushing around the high street taking advantage of their favourite brick and motor stores to an over surge of eCommerce sales.
Brands tend to now take advantage of the “whole” buying period by continuing promotions and discounts, encouraging further spending during Christmas, but even without the efforts, the traditional Christmas period as we all know is another busy time for retailers.

The stigma in your customer data

Marketing being a perfect example to driving these sales and promotions over the festive period must understand that although splitting their annual budgets and running rather extensive campaigns quarter to quarter, it’s common to see the highest spend leading up to Q4 to help maximise those profits leading up and during the festive season.

What we see as a stigma that tends to hold retailers back from obtaining higher sales growth is customer data they are working with. Generally having “fragmented” data as a common mishap. Which in turn can make these prolonged campaigns in the second half of the year less effective as they should be. This can be something to consider when giants like amazon are accounting for as much as 54.9% in sales during this period.

Retailers first must understand where their customers are interacting and engaging, which tends to widen over a selection of channels. These include website and online stores, apps, social media, customer service and technical departments right down to the actual physical brick and motor stores.

During engagement on these channels, customers and prospects will leave a diverse set of information, specific and pertinent, which trends on the individual’s activity over that channel. The issue is that multiple departments may be gathering this information which can cause confusion, duplication of records and inconsistencies over the whole customer life cycle.

Clean and verified data

Another issue that tends to bring retailers down is being able to obtain clean and verified contact data on their customers

There are two scenarios:

Customers onboarding and entering systems & databases

Typically, when they onboard into the organisation’s systems via purchase, registration, signup, enquiry, or any form of inputting their data

– Type the incorrect or misspelt information
– Missing information
– Formatting issues such as casing and standardisation
– Data in the wrong fields
– Same information but structured differently across various channels and systems
– Conflicting info (determining which one is right or wrong or both right)

Current Customer records going stale once entered into systems & databases

24% of database records go stale each year due to the following,

– Moving Address
– Changing status (eg married)
– Email
– Changing jobs
– Phone numbers
– Suppress/die
– DMA and preferences

Which is why it is equally as important to have clarity and consistency on your customer data so each department can deliver better results leading to overall increased revenue.

Data deduplication, matching & merging

While 24% of customer data can go stale over time, in our experience we also see that a further 10% of databases contain duplicated records, this is enough to impact the delivery of that single customer view every retailer strives for.

As well as seeing customers duplicating throughout departments and various systems in an organisation, this can lead to potentially seeing contacts as a new customer, another as a loyal customer, another as prospect which leads to confusion and potential for poor communication which is a waste on spending and risks alienating customers.

Enrich & enhance your data

When it comes to data, the more you get out of it, the more you can do with it – data enrichment and enhancement is a great way to truly understand your customers on various levels. Such variants like geolocation to pinpoint location, demographic and firmographic insight, IP location to missing contact information can not only give you improved targeting for better direct engagement but can find new customers just like your best ones.

Data audits and health checks

We recommend that all retailers have their data audited once every so often so they can get a clearer picture of the overall health of the data they are working with. This is great to identify any flaws and inefficiencies that may be causing issues which be now you should know, leads to further back draws to business success.


Retailers must understand that the accuracy of their customer data directly impacts any business activity downstream from reporting and analytics, segmentation and targeting, marketing all the way down to logistics, delivery, and customer care. So, if your data is inconsistent, everything is going to suffer.

Below is the data quality life cycle, Melissa can put in place for any retailer looking to achieve more. In doing so, the one aspect that every retailer strives for again is that single customer view or that one golden record that aggregates all the additional important information about a customer or prospect so you that all departments have a clear view and understanding of their overall journey with a retailer.

This in turn allows retailers to make sensible business decisions when communicating to their customers to give them an outstanding experience in the build-up to the festive season.



The Data Quality Life Cycle

Best Practices for a Better Email Sender Reputation

Melissa IN Team | Email Verification | , ,

Studies have proved that one of the best ways for brands to reach out to their customers and engage with them is through email marketing. This holds true for a mature audience as well as a young, tech-savvy audience.

That said, no email campaign can be successful if your emails do not reach the intended recipients. This is where your reputation as a sender becomes important. If you don’t have a good sender reputation, email service providers may divert your emails to the spam inbox. You may even be subject to fines. So, let’s take a look at the factors influencing your sender reputation and how you can improve it.

What Gives You A Poor Sender Reputation?

The key reasons why you may get a low sender reputation score are:

  • Recipients often mark your emails as spam
  • Not enough recipients open your emails
  • High bounce rate or a high number of emails sent to invalid email accounts consistently
  • High un-subscription rates

At the crux of these factors lies a reliance on bad data. Think about it – would a person who signed up to receive emails from you mark them as spam? No – this typically happens when information in your database is outdated. For example, if a subscriber signed on to a newsletter with a work email address and later left that job, any email sent to that address would bounce. The good news is that there’s plenty you can do to improve your sender reputation.

Simple Steps To Improve Sender Reputation

The first step to improving your sender reputation is, understanding where you stand. To do this, you should conduct an IP reputation check. IP reputation is typically scored between 0 and 100 with 100 being the highest. Your reputation should ideally be above 70. If it is below 70, here are a few things you can do.

  • Verify email addresses before they enter the system

Given how poor data can damage your sender reputation, the first step towards improving it is to clean your email database. For this, you must first ensure that all email addresses being added to the database are complete and valid. There are a number of email verification tools that can help with this task.

They can help identify incomplete email addresses or those with incorrect syntax and make corrections to them. For example, if a person enters his email address as ajay@gmial.com, the verification tool will correct it to ajay@gmail.com. Also, avoid buying or renting email address lists for your marketing campaigns. Similarly, you may want to avoid adding official work email addresses to your mailing lists.

  • Keep the database clean

In addition to verifying new entries to your database, you must also monitor the health of your database from time to time. Email addresses should be checked for their validity regularly. You need to check that the email addresses listed are complete, correct and still in use. If you were to do this manually, it would take a lot of time and may additionally inconvenience your customers. Email verification tools conduct this validation without the need for manual inputs from the customer and are much quicker.  Database cleaning should not be a one-time affair but an ongoing process.

  • Use segmented email lists

Customers often mark emails as spam or unsubscribe from mailing lists when the content of the emails do not interest them. Given that you probably offer more than one service, customers have unique reasons for signing up for your emails. To ensure that your emails contain information that will interest your customers, avoid using a single long list but use shorter, segmented lists. This makes it easier to match relevant content to your customer’s needs and preferences. For example, you may segment a mailing list according to the recipient’s location, age, their past purchases, etc.

  • Personalize the subject line

Personalizing the subject line of your emails can increase open rates by as much as 50%. This is one of the simplest steps that can get your message delivered and help boost your sender reputation. Instead of using a generic subject line, use it as an extension of your salutation and make sure you mention the recipient’s name in the subject line.

In Conclusion

Emails are a powerful means of marketing your products and services. By protecting your sender reputation, you can maximize the results from an email marketing campaign. With the right tools, you can ensure that your emails reach the right people, without getting pushed into the spam folders or filters. Further, you can ensure that they contain information that is relevant and adds value to your customers so that your audience looks forward to your emails and engages with them.

Finally, use an email verification tool to check your subscriber list before sending any messages. This kind of tool will help you remove invalid addresses from your list, which helps to preserve your IP reputation.

Disparate, Dirty, Duplicated Data – Understanding the 3Ds of Bad Data

Melissa AU Team | Data Quality | , ,

Disparate, Dirty, Duplicated Data – Understanding the 3Ds of Bad Data

In 1999, NASA learned how expensive bad data can be the hard way when they lost the Mars Orbitor in space. Why did this happen – because the engineers made calculations based on the Imperial system of measurements while the NASA scientists used the Metric system.

A simple mistake of not ensuring that the data was measured in the same units cost NASA billions. Such is the impact of bad data.

When talking of bad data quality, there are three ‘D’s that come into play – dirty data, disparate data and duplicate data.

  1. Dirty Data

Entering an address as ‘Main Str’ instead of ‘Main Street’, typographic errors, using numbers in fields intended only for alphabets – these are some of the most common examples of dirty data. Such data issues can be categorized as:

  • Incorrect spellings
  • Negative spacing
  • Incomplete information
  • Incorrect use of upper/lower cases
  • Use of abbreviations and nicknames
  • Incorrect use of punctuations and symbols

It may seem like a small inconsequential error but data specialists and analysts spend a considerable amount of their time simply cleaning dirty data like this. Leaving it as is, is simply not an option. How can you expect delivery agents to reach customers on time if they do not have a complete address or if they cannot understand the street name?

And imagine a customer’s frustration if they were to receive a promotional email that addresses them by a misspelled name…

  1. Disparate Data

Companies collect data from various sources. In theory, this helps create a cohesive record. But, the issue with collecting data from multiple sources is that every source may use a different format to record and present data. A difference in date formats is the simplest example.

The sales team may records dates in the DD/MM/YYYY format while the accounts team may record it in the MM/DD/YYYY format. Thus, the latter may read 06/12/2020 as the 12th of June while the sales team may be referring to the 6th of December. It’s a small misunderstanding that can have dramatic impacts on sales projections, marketing plans, etc.

Disparate data refers to data extracted from different sources and stored in varied data formats. This type of bad data keeps analysts from getting a deeper insight and makes it difficult for them to derive anything of value from the data.

  1. Duplicate Data

Duplication is the third and, in many ways, the biggest data quality issue. There are many reasons why your databank may contain duplicate records.

  • A new record may be created every time information is updated. For example, a new record may be created every time a customer makes a purchase instead of updating the original record.
  • New records may be created every time a customer interacts with the brand through a different medium. For example, let’s say a customer places an order through a brand’s website. The next time, he places an order through the app. Instead of using a single account for both interactions, he may create different accounts – one with his first name and one with his last name.
  • New records may be created when re-registering with new phone numbers of email IDs.
  • System glitches

Duplicate records make a data bank very unreliable. Think of it this way – the marketing team looks at a data bank of 500 records. Of these 300 seem to be in a particular geographic area and hence they decide to open a new branch for easier accessibility.

However, 120 records are duplicates. Thus, the new branch will cater to only 180 customers in reality. If they had access to this information, they may not have decided to open a store in that particular location.

Eliminating all duplicate records manually is simply not possible. For example, a person may create accounts as ‘Aditya Chauhan’, ‘Adi Chauhan’, ‘A. Chauhan’, etc. While some records may share the same email address, others may have only the same phone number. Thus, to truly de-duplicate records, you need an algorithm that compares all the data rows and weeds out cases with even the lowest probability of duplication.

Dealing With The Three ‘D’s

Bad data becomes more expensive the longer it is kept. Thus, it needs to be dealt with as early as possible. Logically speaking, the first step is to put quality checks in place at the data collection source. There are a number of tools that can help with this.

For example, address verification tools ensure that complete addresses. Instead of relying on human input for the complete information, using an autocomplete feature can help minimize errors and capture information in standardized formats. Similar tools can also compare data in new records to the existing database and keep duplicates from being created.

Instead of blaming IT for bad data, data governance policies need to be created to standardize fields for records and minimize the issue of disparate data. This policy will outline how data is collected, processed and managed to ensure that it is accurate and consistent.  It should ideally be flexible so that it can be adapted to changing needs.

Data quality checks at only the collection source are not sufficient to keep bad data out of your system. Data often goes bad simply with time. For example, a city may choose to rename a street, thus, invalidating records that mention the old street name as part of the address. To counter this, data quality checks must be made a routine task.

All you need to do is find the right tools. For example, email verification tools can ping emails without human interaction by the company or customers to check whether the email ID is still in use. Those that have been discontinued can be highlighted and removed from the system.

Lastly, it is important to set realistic goals. Hoping to achieve 100% perfect data is a bad goal. Instead, your goal should be to make data credible and fit for intended use by ensuring that it is accurate, complete, valid, standardized and accessible.