Home > Published Issues > 2025 > Volume 16, No. 7, 2025 >
JAIT 2025 Vol.16(7): 1030-1041
doi: 10.12720/jait.16.7.1030-1041

AI-Generated Text Detection and Source Identification

Anjana Priyatham Tatavarthi *, Faranak Abri*, and Nada Attar
Department of Computer Science, San José State University, San José, USA
Email: anjanapriyatham.tatavarthi@sjsu.edu (A.P.T.); faranak.abri@sjsu.edu (F.A.); nada.attar@sjsu.edu (N.A.)
*Corresponding author

Manuscript received February 3, 2025; revised March 17, 2025; accepted May 13, 2025; published July 25, 2025.

Abstract—The use of advanced machine learning techniques to detect AI-generated text is a very practical application. The ability to distinguish human-written content from machine-generated text while identifying the source generative model helps address growing concerns about authenticity and ac-countability in digital communication. The differentiation of human-generated and AI-generated text is highly relevant to several applications, from news media to academic integrity, and is key to ensuring transparency and trust in content-driven environments. However, existing models are often insufficient to accurately detect AI-generated text and determine the spe-cific AI source due to the complex nature of machine-generated content. To address this, it is essential to leverage state-of-the-art machine learning models and embedding techniques that can capture the subtle linguistic and contextual patterns of AI-generated text. In this study, experiments involving text classification were conducted to develop models capable of distinguishing AI-generated content from human-written text and identifying the specific AI model used, offering a multilayered approach to detection. The results demonstrate that the Long Short-Term Memory (LSTM) model with Bidi-rectional Encoder Representations from Transformers (BERT) embeddings outperformed other embedding techniques at the task of binary classification, achieving a score of 97% for both accuracy and F1 metrics. Additionally, this study illustrates the superior performance of pretrained transformer-based models compared to Recurrent Neural Network (RNN)-based models for four-class source identification, with Robustly optimized BERT approach (RoBERTa) achieving a score of 88% for both accuracy and F1 metrics. This highlights the advantage of leveraging powerful Large Language Models (LLMs) for the complex task of source identification, offering a more robust and scalable solution compared to traditional approaches.
 
Keywords—AI-generated text, Bidirectional Encoder Rep-resentations from Transformers (BERT) model, Bidirectional Long Short-Term Memory (BiLSTM), Deep Learning (DL), Large Language Models (LLMs), Long Short-Term Memory (LSTM), Machine Learning (ML), Robustly optimized BERT approach (RoBERTa) model, word embeddings

Cite: Anjana Priyatham Tatavarthi, Faranak Abri, and Nada Attar, "AI-Generated Text Detection and Source Identification," Journal of Advances in Information Technology, Vol. 16, No. 7, pp. 1030-1041, 2025. doi: 10.12720/jait.16.7.1030-1041

Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions