Home > Published Issues > 2025 > Volume 16, No. 11, 2025 >
JAIT 2025 Vol.16(11): 1644-1663
doi: 10.12720/jait.16.11.1644-1663

Ontology-Based Topic Model for Document Retrieval Systems in Information Technology

Thanh Dien Nguyen 1,*, Van Nhon Do 2,*, and Hoang Tung Tran 3
1. Department of Computing Fundamental, FPT University, Ho Chi Minh City, Vietnam
2. Department of Information Technology, Hong Bang International University, Ho Chi Minh City, Vietnam
3. Department of Information and Communication Technology, Vietnam France University, Ha Noi, Vietnam
Email: diennt4@fe.edu.vn (T.D.N.); nhondv@hiu.vn (V.N.D.); tran-hoang.tung@usth.edu.vn (H.T.T.)
*Corresponding author

Manuscript received June 26, 2025; revised August 1, 2025; accepted August 29, 2025; published November 25, 2025.

Abstract—Most current academic document retrieval systems for topic-based search rely on simple keyword matching or statistical topic modeling. In these methods, topics are formed either from sets of frequent keywords or from statistical clusters. While these approaches work in some contexts, they cannot fully capture the rich semantic meaning of topics as understood by human experts. This often leads to search results that fail to match the intended meaning of the topic, causing gaps between what users need and what the system returns. This study aims to overcome these limitations by developing a topic-based retrieval system that represents topics in a more semantically rich and human-aligned way. The system is designed to help Information Technology (IT) students search for topic-relevant materials—specifically English-language ebooks and research papers—from a curated faculty repository. We introduce C-ONTO, a structured knowledge model that includes topic names, descriptions, learning objectives, real-world application scenarios, and concept graphs describing internal semantic relationships. Documents are also modeled as concept graphs, enabling accurate semantic similarity calculations. An intelligent query analysis module interprets user intent and maps it to the system’s semantic structure. Testing with 425 real student queries in four IT domains shows that the system achieves 81.18% accuracy, outperforming keyword-based and statistical methods in precision, recall, coverage, and F1-Score. By aligning computational topic modeling with human understanding, the proposed system improves accuracy, semantic consistency, and educational relevance in academic document retrieval for information technology and related fields.
 
Keywords—ontology, knowledge representation, topic modeling, semantic document retrieval, concept graph

Cite: Thanh Dien Nguyen, Van Nhon Do, and Hoang Tung Tran, "Ontology-Based Topic Model for Document Retrieval Systems in Information Technology," Journal of Advances in Information Technology, Vol. 16, No. 11, pp. 1644-1663, 2025. doi: 10.12720/jait.16.11.1644-1663

Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions