Home > Published Issues > 2025 > Volume 16, No. 11, 2025 >
JAIT 2025 Vol.16(11): 1511-1519
doi: 10.12720/jait.16.11.1511-1519

Advancing Sentiment Analysis for Colloquial Arabic: A Comparison of Machine Learning and Lexicon-Based Approaches for the Palestinian Dialect

Khalid Rabaya 1, Ahmad Hasasneh 2,*, and Khalil Rantisi 3
1. Department of Computer Science, Faculty of Information Technology, Arab American University, Jenin, Palestine
2. Department of Natural, Engineering, and Technology Sciences, Faculty of Graduate Studies, Arab American University, Ramallah, Palestine
3. Department of Computer Science, Faculty of Information Technology, Al-Quds Open University, Ramallah, Palestine
Email: khalid.rabayah@aaup.edu (K.R.); ahmad.hasasneh@aaup.edu (A.H.); khalil.rantisi@aaup.edu (K.Ran.)
*Corresponding author

Manuscript received May 19, 2025; revised June 30, 2025; accepted August 12, 2025; published November 7, 2025.

Abstract—Formal Arabic is less commonly used in everyday communication, especially on social media, where colloquial Arabic dominates. However, most Arabic Natural Language Processing (NLP) tools, including sentiment analysis systems, are primarily developed for Modern Standard Arabic (MSA), making it difficult to analyze dialectal content. This study addresses this gap by focusing on the Palestinian dialect, which is widely used across the Levant region. Two main approaches were explored: machine learning and lexicon-based methods. A manually labeled dataset of Facebook telecom service reviews was collected, cleaned, and split into training and testing sets. Five machine learning algorithms and a neural network model (Multilayer Perceptron) were evaluated, while the lexicon-based approach used a predefined sentiment lexicon for colloquial Arabic. The machine learning models outperformed the lexicon-based method, achieving accuracy scores between 92.2% and 94.0%, with Naïve Bayes yielding the highest F1-Score (95.0%). The neural network model achieved 95.0% in both accuracy and F1-Score, while the lexicon-based method scored 81.0% accuracy and a 67.0% F1-Score. Although no hybrid model was implemented, the findings suggest future research could explore combining lexical and machine learning approaches.
 
Keywords—language, social media, colloquial Arabic, machine learning, lexicon-based approach, sentiment analysis, multilayer perceptron classifier

Cite: Khalid Rabaya, Ahmad Hasasneh, and Khalil Rantisi, "Advancing Sentiment Analysis for Colloquial Arabic: A Comparison of Machine Learning and Lexicon-Based Approaches for the Palestinian Dialect," Journal of Advances in Information Technology, Vol. 16, No. 11, pp. 1511-1519, 2025. doi: 10.12720/jait.16.11.1511-1519

Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions