A Heuristic Approach To Record Deduplication
Author(s):
Lata.S.Math, Smt.Shashikala.B
Keywords:
Database Administration, Record Deduplication, Genetic programming
Abstract
Databases and database related technologies are having a major impact on the growing use of computers. Many global data repositories collect data from various data sources, due to this the chances of duplicates in repositories are more. The duplicate present in database is the result of misleading words and different writing styles. The presence of duplicate records decreases the system performance as it will take more time to retrieve correct relevant data from database. The clean and replica free repositories allow retrieval of higher quality information. The record deduplication is process of identifying and removal of duplicates present in database. The different approaches used to design the deduplication function are domain knowledge approach, probabilistic approach, and machine learning approach. These approaches additionally require human judgment and large computation time. To resolve the above problem, this project proposes a model to design the deduplication function for identifying the duplicate records presents in data repository by using genetic programming approach. Genetic Programming (GP) approach is a heuristic approach which automatically suggests deduplication function based on the evidence present in the data repositories. The deduplication function will help to predict whether the records are duplicates or not. Its main policy is to avoid the problems that arise due to the existence of duplicate values in the database. The proposed model uses the jaro winkler similarity function to calculate similarity measure between the records.
Article Details
Unique Paper ID: 142476

Publication Volume & Issue: Volume 2, Issue 2

Page(s): 148 - 154
Article Preview & Download


Share This Article

Join our RMS

Conference Alert

NCSEM 2024

National Conference on Sustainable Engineering and Management - 2024

Last Date: 15th March 2024

Call For Paper

Volume 10 Issue 10

Last Date for paper submitting for March Issue is 25 June 2024

About Us

IJIRT.org enables door in research by providing high quality research articles in open access market.

Send us any query related to your research on editor@ijirt.org

Social Media

Google Verified Reviews