Bi-GRU, Classification, Deep Learning, FastText, GloVe, LSTM, NLP, Toxicity
Abstract
Social media and online communities struggle to deal with increasing hate speech, abuse, and cyberbullying, toxic text detection has become a crucial Natural language processing challenge. In this paper, we propose and compare two deep learning models for better text toxicity detection: Long Short-Term Memory (LSTM) and Bi-Directional Gated Recurrent Unit (GRU) neural networks, along with GloVe and FastText embeddings. We also implemented ensemble techniques on the two top performing models. We evaluated the performance of our proposed models on the widely used dataset, the Jigsaw Toxic Comment Classification Challenge. We achieved an accuracy of 92.41% after combining both Bi-GRU with GloVe embeddings and Bi-GRU with FastText embeddings using model averaging ensemble techniques. We also encountered some research gaps while training the models: Biases affecting the models out of which algorithmic bias has a strong influence on what is considered abuse or slurs and even making inaccurate and unfair predictions, another gap is the lack of proper training data affecting the detection of toxicity in conversations where the language is dynamic and changing. We proposed four research questions based on the identified gaps and tried to answer them with experimentation and literature review.
Article Details
Unique Paper ID: 160891
Publication Volume & Issue: Volume 10, Issue 1
Page(s): 1493 - 1511
Article Preview & Download
Share This Article
Join our RMS
Conference Alert
NCSEM 2024
National Conference on Sustainable Engineering and Management - 2024