Home > Published Issues > 2022 > Volume 13, No. 6, December 2022 >
JAIT 2022 Vol.13(6): 652-661
doi: 10.12720/jait.13.6.652-661

Detection of Fake News Using Machine Learning and Natural Language Processing Algorithms

Noshin Nirvana Prachi, Md. Habibullah, Md. Emanul Haque Rafi, Evan Alam, and Riasat Khan
Electrical and Computer Engineering, North South University, Dhaka, Bangladesh

Abstract—The amount of information shared on the internet, primarily via web-based networking media, is regularly increasing. Because of the easy availability and exponential expansion of data through social media networks, distinguishing between fake and real information is not straightforward. Most smartphone users tend to read news on social media rather than on the internet. The information published on news websites often needs to be authenticated. The simple spread of information and news by instant sharing has included the exponential growth of its misrepresentation. So, fake news has been a major issue ever since the growth and expansion of the internet for the general mass. This paper employs several machine learning, deep learning and natural language processing techniques for detecting false news, such as logistic regression, decision tree, naive bayes, support vector machine, long short-term memory, and bidirectional encoder representation from transformers. Initially, the machine learning and deep learning approaches are trained using an open-source fake news detection dataset to determine if the information is authentic or counterfeit. In this work, the corresponding feature vectors are generated from various feature engineering methods such as regex, tokenization, stop words, lemmatization and term frequency-inverse document frequency. All the machine learning and natural language processing models’ performance were evaluated in terms of accuracy, precision, recall, F-1 score, ROC curve, etc. For the machine learning models, logistic regression, decision tree, naive bayes, and SVM achieved classification accuracies of 73.75%, 89.66%, 74.19%, and 76.65%, respectively. Finally, the LSTM attained 95% accuracy, and the NLP-based BERT technique obtained the highest accuracy of 98%.  
 
Index Terms—bidirectional encoder representation from transformers, fake news detection, lemmatization, long short-term memory, naive Bayes, support vector machine, tokenization 
 
Cite: Noshin Nirvana Prachi, Md. Habibullah, Md. Emanul Haque Rafi, Evan Alam, and Riasat Khan, "Detection of Fake News Using Machine Learning and Natural Language Processing Algorithms," Journal of Advances in Information Technology, Vol. 13, No. 6, pp. 652-661, December 2022.

Copyright © 2022 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.