Home > Published Issues > 2026 > Volume 17, No. 2, 2026 >
JAIT 2026 Vol.17(2): 300-310
doi: 10.12720/jait.17.2.300-310

Enhanced Sentiment Analysis Using Extended Corpus and Co-Occurrence Graph

Nirach Romyen 1,*, Herwig Unger 2, and Maleerat Maliyaem 1,*
1. Department of Information Technology, Faculty of Information Technology and Digital Innovation, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
2. Department of Mathematics and Computer Science, University of Hagen, Hagen, Germany
Email: nirach@sci.tu.ac.th (N.R.); herwig.unger@gmail.com (H.U.); maleerat.m@itd.kmutnb.ac.th (M.M.)
*Corresponding author

Manuscript received June 18, 2025; revised September 1, 2025; accepted November 9, 2025; published February 10, 2026.

Abstract—Lexicon-based sentiment analysis often faces challenges due to the limited coverage and adaptability of existing sentiment lexicons, particularly in domain-specific or resource-scarce contexts. Constructing new lexicons from scratch is labor-intensive and rarely feasible for large-scale applications. To address this gap, we introduce the Enhanced Sentiment Analysis using Extended Corpus and Co-Occurrence Graph (ECG) framework, which leverages co-occurrence graph construction, triadic closure theory, and PageRank-inspired weighting to propagate sentiment values from a small seed set to a broader vocabulary. Our study focuses on nouns extracted from the Harry Potter corpus, which are typically underrepresented in standard sentiment resources. Experimental results demonstrate that ECG achieves effective sentiment propagation within 1500 iterations, reaching stable convergence. Evaluation using the Coefficient of Variation (COV) indicates improved sentiment differentiation, with broader coverage compared to benchmark lexicons such as SentiWordNet, Valence Aware Dictionary and Sentiment Reasoner (VADER), and National Research Council (NRC). For instance, ECG consistently assigned interpretable sentiment values to narrative-specific entities (e.g., “Slytherin”, “Voldemort”), which were often missed by conventional lexicons. These findings highlight ECG’s scalability, interpretability, and suitability for low-resource scenarios. Beyond literary analysis, the framework can be extended to other domains such as social media, customer feedback, and specialized corpora. Overall, ECG represents a resource-efficient alternative that bridges the limitations of static lexicons and enhances sentiment inference in diverse contexts.
 
Keywords—sentiment analysis, co-occurrence graph, sentiment propagation, triadic closure, lexical expansion

Cite: Nirach Romyen, Herwig Unger, and Maleerat Maliyaem, "Enhanced Sentiment Analysis Using Extended Corpus and Co-Occurrence Graph," Journal of Advances in Information Technology, Vol. 17, No. 2, pp. 300-310, 2026. doi: 10.12720/jait.17.2.300-310

Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions