Home
Author Guide
Editor Guide
Reviewer Guide
Published Issues
Special Issue
Introduction
Special Issues List
Sections and Topics
Sections
Topics
Internet of Things (IoT) in Smart Systems and Applications
journal menu
Aims and Scope
Editorial Board
Indexing Service
Article Processing Charge
Open Access
Copyright and Licensing
Preservation and Repository Policy
Publication Ethics
Editorial Process
Contact Us
General Information
ISSN:
1798-2340 (Online)
Frequency:
Monthly
DOI:
10.12720/jait
Indexing:
ESCI (Web of Science)
,
Scopus
,
CNKI
,
etc
.
Acceptance Rate:
12%
APC:
1000 USD
Average Days to Accept:
87 days
Journal Metrics:
Impact Factor 2023: 0.9
4.2
2023
CiteScore
57th percentile
Powered by
Article Metrics in Dimensions
Editor-in-Chief
Prof. Kin C. Yow
University of Regina, Saskatchewan, Canada
I'm delighted to serve as the Editor-in-Chief of
Journal of Advances in Information Technology
.
JAIT
is intended to reflect new directions of research and report latest advances in information technology. I will do my best to increase the prestige of the journal.
What's New
2025-01-10
All 12 papers published in JAIT Vol. 15, No. 10 have been indexed by Scopus.
2024-12-23
JAIT Vol. 15, No. 12 has been published online!
2024-06-07
JAIT received the CiteScore 2023 with 4.2, ranked #169/394 in Category Computer Science: Information Systems, #174/395 in Category Computer Science: Computer Networks and Communications, #226/350 in Category Computer Science: Computer Science Applications
Home
>
Published Issues
>
2022
>
Volume 13, No. 6, December 2022
>
JAIT 2022 Vol.13(6): 562-568
doi: 10.12720/jait.13.6.562-568
N-gram and Word2Vec Feature Engineering Approaches for Spam Recognition on Some Influential Twitter Topics in Saudi Arabia
Ahmed M. Balfagih, Vlado Keselj, and Stacey Taylor
Faculty of Computer Science, Dalhousie University, Halifax, Canada
Abstract
—Social media platforms, such as Twitter, have become powerful sources of information on people’s perception of major events. Many people use Twitter to express their views on various issues and events and use it to develop their opinion on the diverse economic, political, technical, and social occurrences related to their daily lives. Spam and non-relevant tweets are a major challenge for Twitter trend detection. Saudi Arabia is a top ranked country in Twitter usage worldwide, and in recent years has experienced difficulties due to the use and rise of hashtags based on misleading tweets and spam. The goal of this paper is to apply machine learning techniques to identify spam on the Saudi tweets collected to the end of 2020. To date, spam detection on Twitter data has been mostly done in English, leaving other major languages, such as Arabic, insufficiently covered. Additionally, publicly accessible Arabic Twitter datasets are hard to find. For our research, we use eight Twitter datasets on some significant topics in politics, health, national affairs, economy, and sport, to train and evaluate different machine learning algorithms, with a focus on two feature generation techniques based on N-grams and Word2Vec embeddings. One contribution of this paper is providing these new labelled datasets with embeddings. The experimental results show improvement from using embeddings over N-grams in more balanced datasets vs. more unbalanced ones. We also find a superior performance of the Random Forest algorithm over other algorithms in most experiments.
Index Terms
—Twitter, spam detection, machine learning, preprocessing, social media
Cite: Ahmed M. Balfagih, Vlado Keselj, and Stacey Taylor, "N-gram and Word2Vec Feature Engineering Approaches for Spam Recognition on Some Influential Twitter Topics in Saudi Arabia," Journal of Advances in Information Technology, Vol. 13, No. 6, pp. 562-568, December 2022.
Copyright © 2022 by the authors. This is an open access article distributed under the Creative Commons Attribution License (
CC BY-NC-ND 4.0
), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.
附件说明
PREVIOUS PAPER
Frequent Block Access Pattern-Based Replication Algorithm for Improving the Performance of Cloud Storage Systems
NEXT PAPER
IoT-Based Garbage Container System Using NodeMCU ESP32 Microcontroller