Home > Published Issues > 2026 > Volume 17, No. 3, 2026 >
JAIT 2026 Vol.17(3): 438-449
doi: 10.12720/jait.17.3.438-449

Testing the Limits: Evaluating AI Detectors’ Accuracy and the Impact of Obfuscation Techniques on AI-Generated Text

Alfira Makhmutova 1,*, Batyr Sharimbayev 2, Altynbek Amirzhanov 3, and Ardak Shalkarbay-uly 4
1. Department of General Education, New Uzbekistan University, Tashkent, Uzbekistan
2. Department of Information Systems, SDU University, Kaskelen, Kazakhstan
3. Department of Mathematics and Natural Sciences, SDU University, Kaskelen, Kazakhstan
4. Institute of Digital Transformation and Artificial Intelligence, Narxoz University, Almaty, Kazakhstan
Email: a.makhmutova@newuu.uz (A.M.); batyr.sharimbayev@sdu.edu.kz (B.S.);
altynbek.amirzhanov@sdu.edu.kz (A.A.); ardak.shalkar@gmail.com (A.S.)
*Corresponding author

Manuscript received July 24, 2025; revised August 27, 2025; accepted October 31, 2025; published March 10, 2026.

Abstract—The rise of Artificial Intelligence (AI)-generated text has led to the development of numerous detection tools to distinguish between human and machine-authored content. However, the effectiveness of these tools, especially against manipulated texts, remains uncertain. This study evaluates nine widely used AI detection tools—Turnitin, ZeroGPT, Detecting-AI.com, GPTZero, QuillBot, Grammarly, Sapling, Copyleaks, and Originality.ai—using texts from four large language models—ChatGPT, DeepSeek, Gemini, and Grok—as well as human-written samples. Initial findings indicate that commercial tools, such as Copyleaks and Originality.ai, achieved near-perfect detection rates, while free tools, including Grammarly and QuillBot, performed less reliably, with some as low as 63.0%. On the other hand, paraphrasing and Non-Native English Speakers (NNES)-style rewriting techniques reduced detection accuracy across most detectors. Turnitin dropped to 45.7%, while Grammarly fell to 19.0% in some cases. Only Copyleaks, GPTZero, and Sapling maintained strong performance under obfuscation. The study highlights three issues: inconsistent detector performance, the impact of obfuscation, and ethical risks, including bias and false positives. The study suggests that while some detectors offer robust baseline performance, combining them with pedagogical strategies and policies is essential to uphold academic integrity.
 
Keywords—AI detection tools, large language models, obfuscation techniques, academic integrity, ethical implications

Cite: Alfira Makhmutova, Batyr Sharimbayev, Altynbek Amirzhanov, and Ardak Shalkarbay-uly, "Testing the Limits: Evaluating AI Detectors’ Accuracy and the Impact of Obfuscation Techniques on AI-Generated Text," Journal of Advances in Information Technology, Vol. 17, No. 3, pp. 438-449, 2026. doi: 10.12720/jait.17.3.438-449

Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions