Frequently Asked Questions

Matching Results



How does dedupeIT's matching work?

dedupeIT breaks the matching process into two steps:

  • The Import process loads the data into a dedupeIT database, and generates phonetic and other "match keys". The phonetic keys allow it to match e.g. Deighton and Dayton, Phillips and Philips.
  • The Find Matches process takes the data file created by Import, and looks for matches (i.e. potential duplicates) in it. It does this by using match keys to highlight pairs of records that might be matches. It takes each pair of records that may match and compares the name, address and Zip/postcode fields in both records. Each of these items that matches contributes a score to a total match score for the pair. The more closely the pair of records matches overall, the higher the total match score. There are many specific techniques applied to the different types of data, for example:
    • When comparing name fields, dedupeIT compares them as a block, even if they are separated into e.g. title, initials and last name. dedupeIT allows for phonetic, reading and typing errors in the data. It also allows for nicknames and missing and reversed elements in names e.g. Robert Graham, Bob Graham and Mr R J Graham all get a sure match score; Robert Graham and Mr Graham get a likely match score; Robert Graham and Graham Robert get a possible match score.
    • When comparing address lines, dedupeIT also compares them as a block, to allow for cases where e.g. one address has a house or building name and the other address does not.

When I dedupe my file, the dupes shown are all false matches (not duplicates)

If none of the matches shown are potential duplicates, it is likely that something has gone wrong with the setup of the file by the Import Wizard. You can check this by looking at the names of the fields shown in the Inspect Matches window. If any field names are not correctly describing the data within that field, repeat the job, paying particular attention to the naming of all the fields to ensure that all the name and address fields are correctly labelled. Even if the field names are correct, try repeating the job. If the dupes are still all false, .


The records shown by dedupeIT are mostly/frequently not dupes.

dedupeIT shows the lower scoring (least likely) matches first, so as you go through the list of matches, they should start to include more true matches and less false matches. If this is the case and the matches at the lower scores are all false, you can increase the Minimum score to report in dedupeIT Options to a value that excludes these false matches. If the false matches are distributed evenly through the file, refer to the answer to the previous question.


When I dedupe my file, I have to spend ages confirming which records are actually dupes.

Refer to the answer to the previous question. If the matches are all true matches above a particular score, you can use Delete Matches when prompted (or from the Results menu) to automatically remove all the dupes that reach that score.


dedupeIT is not picking up all the dupes in my file.

Check that the field names are correct by looking at the names of the fields shown in the Inspect Matches window. If any field names are not correctly describing the data within that field, repeat the job, paying particular attention to the naming of all the fields to ensure that all the name and address fields are correctly labelled. If the field names are correct, try reducing the Minimum score to report in dedupeIT Options by 10 or 15. If the duplicates are still not all being picked up, .


When I delete the dupes in Inspect Matches, I often find records that have already been deleted or have already been shown.

This is because if you have more than two records which match each other, say 1, 2 and 3, then record 1 matches record 2 and record 1 matches record 3, but also record 2 matches record 3. So, you will see each of these records twice when reviewing the matches, but not necessarily consecutively as the different pair combinations may have got different scores. For this reason, dedupeIT allocates a unique reference to each record, so that you can see whether a record has been repeated in another pair, or is actually a different record.


Can I correct the data shown when I'm looking at dupes?

Yes. You can overtype data or use the normal Edit functions of Cut (Ctrl+X), Copy (Ctrl+C) and Paste (Ctrl+V). In addition, you can copy the contents of any field to the corresponding field in the other record by right clicking on the field.


Can I automatically combine the data shown in each pair of records to create a "best record"?

Not in dedupeIT. You can buy a more functional product from the helpIT systems' range, matchIT, to do this. For more information, visit www.helpit.com or .


Are there keyboard shortcuts (equivalents) to using the mouse in Inspect Matches?

Yes. To see what each shortcut is, hover the mouse over the corresponding button.


What does the Next Score button mean (in Inspect Matches)?

This skips over the displayed match pair and all others that have the same match score as the displayed pair. It will display the first matching pair that has a higher match score.


Dupes are shown more than once on the Matching Pairs report.

Refer to the answer to the question "When I delete the dupes in Inspect Matches, I often find records that have already been deleted or have already been shown".


Why are some exact matches only scoring 100, where similar identical pairs are scoring 120?

The most common cause of this is that the two names don't have any forenames or initials. For example, Mr J Brown and Mr Brown. In this case, dedupeIT doesn't have enough information to be sure that these are actually the same person e.g. they could be father and son. dedupeIT gives this match a lower score than exact to reflect this possible difference.


I have two records that are exactly the same, but they aren't scoring the maximum score. Why is this?

The fewer data items that are the same, the lower the match score - empty data items score less than non-empty identical items. Refer also to the answer to the previous question. Other common reasons for this happening are:

  • neither record has a zip/postcode
  • the person's name does not seem to be a valid name e.g. Managing Director, Part Sales
  • company name is empty, when deduping to business level.

I often see dupes where one record is male and the other female. How can I prevent this?

Switch on the dedupeIT option Must have match on gender to avoid seeing these matches. This setting is the default for matching to Person level. To restore the default settings, just reselect Person (contact) Matching Level.


How can I restore the default settings for matching?

From dedupeIT Options, just reselect the appropriate Matching Level.


What do the statistics shown on the summary report mean?

These are explained fully in the on-line Help in the topics Import Summary and Matching Summary.



 

See our comprehensive range of other professional data cleansing software products at