Home > Published Issues > 2023 > Volume 14, No. 6, 2023 >
JAIT 2023 Vol.14(6): 1206-1213
doi: 10.12720/jait.14.6.1206-1213

Enhancing Sentiment Analysis on Social Media with Novel Preprocessing Techniques

Khouloud Safi Eljil 1,2,*, Farid Nait-Abdesselam 3, Essia Hamouda 4, and Mohamed Hamdi 2
1. Computer Science Department, Université Paris Cité, France
2. Higher School of Communications, University of Carthage, SUP’COM, Tunisia
3. School of Science and Engineering, University of Missouri, Kansas City, USA
4. Department of Information and Decision Sciences, California State University, San Bernardino, USA
*Correspondence: khouloud.safi-eljil@etu.u-paris.fr (K.S.E.)

Manuscript received April 18, 2023; revised May 10, 2023; accepted July 13, 2023; published November 16, 2023.

Abstract—Sentiment analysis is a highly valuable tool, particularly in the realm of social media, as it enables us to understand the public’s opinions regarding specific products or topics. However, analyzing short and unstructured texts like tweets can present significant challenges. This paper explores conventional Machine Learning (ML) approaches like Naive Bayes, Logistic Regression, and Support Vector Machine to analyze sentiment and compares them against Bidirectional Encoder Representations from Transformer (BERT). Moreover, we suggest a new preprocessing technique for sentiment analysis to enhance the effectiveness of these methods. Our findings demonstrate noteworthy enhancements in the performance of conventional ML models. Interestingly, our study reveals that BERT outperforms all aforementioned models, yielding an accuracy of about 94%, though incurring a high computational cost. Additionally, Logistic Regression performs well with a 90.35% accuracy rate. With respect to feature extraction, we showcase that combining unigram and bigram words provides a more thorough comprehension of negation, as opposed to solely relying on unigrams. Finally, we propose an approach for managing emoticons and emojis that has proven to be useful in the fields of sentiment analysis and sarcasm interpretation.
Keywords—natural language processing, machine learning, feature extraction, social media, comparative analysis

Cite: Khouloud Safi Eljil, Farid Nait-Abdesselam, Essia Hamouda, and Mohamed Hamdi, "Enhancing Sentiment Analysis on Social Media with Novel Preprocessing Techniques ," Journal of Advances in Information Technology, Vol. 14, No. 6, pp. 1206-1213, 2023.

Copyright © 2023 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.