Feature Selection is a method of identifying a subset of features that are useful for model construction which gives compatible results. In the contemporary world, the data repository consists of redundant and irrelevant features which will have harmful effect on the solution. Irrelevant or redundant features must be avoided in order to reduce the negative effect on the accuracy of the classifier. There are methods for implementing feature selection which include Exhaustive, Best fit, Simulated annealing, Genetic Algorithm, Greedy forward selection and many other methods. Genetic Algorithms (GAs) is a meta heuristic search technique which belongs to the family of evolutionary algorithms, mostly used to find approximate solutions. These heuristics is a general method for solving a particular problem mainly in the areas of optimization and searchproblems. The GAs involves calculating power both in time and resources. No frameworks exist for the development of GAs to be executed in parallel, even though some sequential ones exist.Therefore, these kinds of problems can be solved using Hadoop. Apache Hadoop is one of the common services that can be exploited for parallel applications. Apache Hadoop is a software framework which stores and processes the Big Data on clusters of commodity hardware without using complex programming models. The Hadoop Distributed File System (HDFS) is fault tolerant which holds very large amount of data across multiple machines. Hadoop renders a command interface to interact with HDFS. The project sharpens on depicting a new approach for feature selection utilizing parallel GAs on the Hadoop platform, following MapReduce paradigm.
Article Details
Unique Paper ID: 142390
Publication Volume & Issue: Volume 2, Issue 1
Page(s): 244 - 251
Article Preview & Download
Share This Article
Conference Alert
NCSST-2023
AICTE Sponsored National Conference on Smart Systems and Technologies
Last Date: 25th November 2023
SWEC- Management
LATEST INNOVATION’S AND FUTURE TRENDS IN MANAGEMENT