A Heuristic Approach To Record Deduplication
Lata.S.Math, Smt.Shashikala.B
Database Administration, Record Deduplication, Genetic programming
Databases and database related technologies are having a major impact on the growing use of computers. Many global data repositories collect data from various data sources, due to this the chances of duplicates in repositories are more. The duplicate present in database is the result of misleading words and different writing styles. The presence of duplicate records decreases the system performance as it will take more time to retrieve correct relevant data from database. The clean and replica free repositories allow retrieval of higher quality information. The record deduplication is process of identifying and removal of duplicates present in database. The different approaches used to design the deduplication function are domain knowledge approach, probabilistic approach, and machine learning approach. These approaches additionally require human judgment and large computation time. To resolve the above problem, this project proposes a model to design the deduplication function for identifying the duplicate records presents in data repository by using genetic programming approach. Genetic Programming (GP) approach is a heuristic approach which automatically suggests deduplication function based on the evidence present in the data repositories. The deduplication function will help to predict whether the records are duplicates or not. Its main policy is to avoid the problems that arise due to the existence of duplicate values in the database. The proposed model uses the jaro winkler similarity function to calculate similarity measure between the records.
Article Details
Unique Paper ID: 142476

Publication Volume & Issue: Volume 2, Issue 2

Page(s): 148 - 154
Article Preview & Download

Share This Article

Conference Alert


AICTE Sponsored National Conference on Smart Systems and Technologies

Last Date: 25th November 2023

SWEC- Management


Last Date: 7th November 2023

Go To Issue

Call For Paper

Volume 10 Issue 1

Last Date for paper submitting for March Issue is 25 June 2023

About Us

IJIRT.org enables door in research by providing high quality research articles in open access market.

Send us any query related to your research on editor@ijirt.org

Social Media

Google Verified Reviews