Home > Published Issues > 2025 > Volume 16, No. 8, 2025 >
JAIT 2025 Vol.16(8): 1118-1126
doi: 10.12720/jait.16.8.1118-1126

Evaluating Large Language Models for Table Data Extraction from Annual Reports in PDF

Ngoc Bao Nguyen 1, Levi Kammermann 1, and Thomas Hanne 2,*
1. School of Business, University of Applied Sciences and Arts Northwestern Switzerland, Olten, Switzerland
2. Institute for Information Systems, University of Applied Sciences and Arts Northwestern Switzerland, Olten, Switzerland
Email: ngocbao.nguyen@students.fhnw.ch (N.B.N.); levi.kammermann@students.fhnw.ch (L.K.); thomas.hanne@fhnw.ch (T.H.)
*Corresponding author

Manuscript received December 3, 2024; revised December 26, 2024; accepted April 8, 2025; published August 18, 2025.

Abstract—This paper assesses which Large Language Model (LLM), ChatGPT 3.5 or Google Bard (Bard), performs better in extracting tabular data from annual reports of major publicly traded companies in Portable Document Format (PDF). Through an examination of 40 experiments with different difficulty levels, this study delves into the capabilities of these LLMs in processing and extracting financial information. The evaluation metrics, including relevance, accuracy, completeness, consistency, and context awareness reveal varying degrees of proficiency when handling complex financial data for both ChatGPT 3.5 and Bard. In our study, ChatGPT 3.5 demonstrated superior performance, particularly on more challenging questions. The findings offer valuable insights into the utility of LLMs in financial data extraction and provide recommendations for future research.
 
Keywords—Large Language Models (LLMs), table extraction, table content, information extraction, financial analysis, annual reports, evaluation

Cite: Ngoc Bao Nguyen, Levi Kammermann, and Thomas Hanne, "Evaluating Large Language Models for Table Data Extraction from Annual Reports in PDF," Journal of Advances in Information Technology, Vol. 16, No. 8, pp. 1118-1126, 2025. doi: 10.12720/jait.16.8.1118-1126

Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions