Home > Published Issues > 2025 > Volume 16, No. 9, 2025 >
JAIT 2025 Vol.16(9): 1226-1235
doi: 10.12720/jait.16.9.1226-1235

Optimization Techniques for Dealing with Small Dataset for Sentiment Analysis

Isfaque AL Kaderi Tuhin *, Zhengkui Wang, Xiaorong Li, and Wei Zhang
Information and Communications Technology, Singapore Institute of Technology, Singapore, Singapore
Email: tuhin.kaderi@singaporetech.edu.sg (I.A.K.T.); zhengkui.wang@singaporetech.edu.sg (Z.W.); xiaorong.li@singaporetech.edu.sg (X.L.); wei.zhang@singaporetech.edu.sg (W.Z.)
*Corresponding author

Manuscript received February 3, 2025; revised March 4, 2025; accepted May 29, 2025; published September 5, 2025.

Abstract—Sentiment analysis is crucial for many organizations, including those in the transportation industry which use it to gain insights into current issues and improve services provided by public transport operators. However, industries such as transportation face difficulties in fully utilizing AI tools due to the lack of annotated, domain-specific datasets. This scarcity often stems from challenges such as the sensitive nature of the data and a shortage of manpower dedicated to data annotation. Although many sentiment analysis technologies exist, including state-of-the-art transformer-based models, typically require access to large, annotated datasets. This creates a gap in solutions for scenarios characterized by limited and imbalanced data. Our research aims to address this gap by systematically exploring strategies for optimizing sentiment analysis with small, imbalanced datasets for multi-class sentiment classification tasks. We consider constraints posed by data privacy and resource limitations, proposing methodologies that enhance sentiment analysis accuracy without the need for large datasets or extensive annotation efforts. Using RoBERTa, a transformer-based pre-trained model designed for sentiment analysis, and a combination of optimization and data augmentation techniques, we aim to extend the capabilities of sentiment analysis models to perform effectively in data-sparse situations. Our approach addresses the challenges of small datasets and contributes to the broader field of sentiment analysis by offering scalable solutions that can be adapted to various domain-specific environments. Our experimentation has achieved significant improvements in prediction accuracy, demonstrating the feasibility and effectiveness of our approach. By integrating theoretical insights with practical applications, our study sheds light on the untapped potential of small datasets in sentiment analysis. It provides a roadmap for leveraging advanced optimization techniques and inn
 
Keywords—Natural Language Processing (NLP), sentiment analysis, small data, imbalance data, transformers

Cite: Isfaque AL Kaderi Tuhin, Zhengkui Wang, Xiaorong Li, and Wei Zhang, "Optimization Techniques for Dealing with Small Dataset for Sentiment Analysis," Journal of Advances in Information Technology, Vol. 16, No. 9, pp. 1226-1235, 2025. doi: 10.12720/jait.16.9.1226-1235

Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions