Home > Published Issues > 2025 > Volume 16, No. 8, 2025 >
JAIT 2025 Vol.16(8): 1048-1060
doi: 10.12720/jait.16.8.1048-1060

TEXTWISE: Text Exploration through Interactive Natural Language Processing for Wide-Ranging Insights and Semantic Exploration

Hema Pandey 1, Benjamin Kloepper 1, Ruben Huehnerbein 2, Muskaan Singh 3, Binh Vu 1, Sina Mehraeen 1, Mehrdad Jalali 1, and Swati Chandna 1,*
1. Applied Data Science and Analytics, SRH University Heidelberg, Heidelberg, Germany
2. Industrial Data Analytics, ABB Corporate Research, Ladenburg, Germany
3. School of Computing, Engineering and Intelligent Systems, Ulster University, Derry/Londonderry, UK
Email: hemapandey.srh@gmail.com (H.P.); kloepper@posteo.de (B.K.); ruben.huehnerbein@de.abb.com (R.H.); m.singh@ulster.ac.uk (M.S.); binh.vu@srh.de (B.V.); sina.mehraeen@srh.de (S.M); mehrdad.jalali@srh.de (M.J); swati.chandna@srh.de (S.C.)
*Corresponding author

Manuscript received August 11, 2024; revised September 6, 2024; accepted February 21, 2025; published August 8, 2025.

Abstract—The task of generating accurately labeled datasets in Natural Language Processing (NLP) is notably challenging due to the high cost and extensive time requirements, compounded by the reliance on large volumes of unstructured data scraped from the web. Addressing this, our research introduces a novel framework utilizing Explanatory Interactive Machine Learning (XIL) and Explainable Artificial Intelligence (XAI). This framework enables the dynamic labeling of text data without predefined categories, significantly reducing the dependence on human annotators. Our methodology employs a topic modeling approach that allows a single annotator to label data efficiently with minimal oversight. In testing, this method trained a classifier on as few as 600 documents, achieving a precision of approximately 0.70. This precision is comparable to that of a classifier trained on a fully labeled dataset of 13,000 documents, demonstrating our system’s effectiveness while using less than 5% of the labeled data typically required. These findings highlight how our approach not only enhances the transparency of the labeling process but also reduces its resource intensity, offering substantial improvements over traditional methods in both scalability and efficiency. This proof of concept paves the way for broader applications of explainable interactive NLP across various domains.
 
Keywords—Explainable Artificial Intelligence (XAI), Explanatory Interactive Machine Learning (XIL), text mining, topic modelling, text labeling, unsupervised learning

Cite: Hema Pandey, Benjamin Kloepper, Ruben Huehnerbein, Muskaan Singh, Binh Vu, Sina Mehraeen, Mehrdad Jalali, and Swati Chandna, "TEXTWISE: Text Exploration through Interactive Natural Language Processing for Wide-Ranging Insights and Semantic Exploration," Journal of Advances in Information Technology, Vol. 16, No. 8, pp. 1048-1060, 2025. doi: 10.12720/jait.16.8.1048-1060

Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions