Pitney Bowes Group 1 Software


"Deduplication"

Deduplication, also known as record linkage, is the task of finding the same (duplicate) entry in multiple files. Deduplication is used when merging two or more data sets. Deduplication is a useful tool when performing data mining tasks, where the data originated from different sources or different organizations.

Record linkage is the term used by statisticians, epidemiologist and historians among others. Commercial mail and database applications refer to it as merge/purge processing or list washing. Computer scientists often refer to it as data matching or as the object identity problem. Other names used to describe the same concept include entity resolution, duplicate detection, record matching, instance identification, coreference resolution, reference reconciliation and database hardening