Frequently Asked Questions

Processing



dedupeIT seems to be hanging (or taking ages) when deduping my file. Why?

The deduplication process is broken into three processes: Import, Find Matches and Output. The process that is running is shown in the Step Type in the Statistics for this Step (lower left) section of the dedupeIT window. The Import and Output processes should run at a fairly even rate - the progress information in this section of the window shows how far each step has still to run. The Find Matches process can run at an uneven rate, depending on which step is being processed.

If there is no sign of activity (e.g. hard disk light flashing) during the Import or Output process, dedupeIT has probably hung and you will need to End Task (via Ctrl+Alt+Del) and try again. If necessary, reboot the PC and try again.

If there is no sign of activity during the Find Matches process, it may simply be that your file has a lot of incomplete data e.g. empty names, which can cause dedupeIT to slow down significantly. Try leaving dedupeIT running for another 5 minutes or so and if there is still no evidence of progress, End Task as described above.


I get a message "The ESTIMATED number of records is in excess of the 50,000 record limit for this version of dedupeIT" when I start to process my file.

dedupeIT is limited to files of 50,000 records or fewer, because larger files tend to require more flexibility for matching options than dedupeIT is designed to allow. dedupeIT has done some approximate calculations and decided to display this message. However, the import of the file will continue, so that dedupeIT can check whether the actual number of records is in fact greater than 50,000. If it is greater than 50,000 records, the job will stop and you cannot process this file in dedupeIT. You can buy a more functional product from the helpIT systems' range, matchIT, to dedupe this file. For more information, visit www.helpit.com or .


I get a message " this version of dedupeIT is limited to files of 50,000 records. The Import procedure will be terminated." when I start to process my file.

Refer to the answer to the previous question.


What do the statistics shown on the screen (when dedupeIT is processing) mean?

When dedupeIT is processing your data, you will see some bar graphs appear on the screen, which show the progress of the job and various statistics about your file and the data within it. These are explained fully in the on-line Help.


What does "Number of groups > 500" on the matching statistics display mean?

If you are working on large files, or the files have a really high level of duplication in them, then you may see that there are a "number of groups greater than 500". The reason for this is as follows:

To avoid grinding to a halt (or appearing to "hang"), dedupeIT will not compare every single record for a given match key with every other record for that match key if the group of records with that match key value (e.g. a specific postcode) exceeds 500 records. Generally, this will not affect the quality of your results, because any duplicates missed on one pass of matching will be found on another. If you find that the number of groups reported here is very high, or you are concerned about it, then you can do another dedupe run on the output file from dedupeIT. This will tell you if dedupeIT missed any dupes the first time around and should find them for you, unless the data has an excessive number of records for e.g. the same postcode or company.



 

See our comprehensive range of other professional data cleansing software products at