Home > Published Issues > 2026 > Volume 17, No. 3, 2026 >
JAIT 2026 Vol.17(3): 519-533
doi: 10.12720/jait.17.3.519-533

Hybrid Table Formats for Lakehouse Systems: A Functional Benchmark of Iceberg, Hudi, and Delta

Srinivas Lakkireddy
Independent Researcher, USA
Email: reachlakkireddy@gmail.com

Manuscript received September 27, 2025; revised November 17, 2025; accepted December 8, 2025; published March 10, 2026.

Abstract—The swift development of the Lakehouse architectures has brought hybrid table formats such as Apache Iceberg, Apache Hudi, and Delta Lake that enmesh the transactional guarantees of data warehouses with the scalability of data lakes. At the same time, they are increasingly used in enterprise data pipelines; existing benchmarking studies typically do not include an integrated comparison of these formats across workload and functional dimensions. Previous studies mainly focus on individual performance metrics or narrow case-prone analysis, and hence lack a comprehensive overview of feature support and suitability for hybrid workloads. The current study aims to fill this gap by providing a functionality-driven benchmark framework for hybrid table formats that evaluates ingestion throughput, schema evolution, streaming support, compaction efficiency, and concurrency. It presents a Functional Capability Score (FCS) algorithm that uses qualitative output from 15 specific functional test cases to estimate format-specific capabilities. What is it: A reproducible Docker and Apache Spark-based benchmarking testbed that simulates both batch and streaming operations over synthetic and semi-realistic datasets in a fair and scalable manner. According to the experimental results, Delta Lake outperforms in both streaming and batch read speeds. At the same time, Iceberg offers improved schema evolution through metadata-driven table versioning mechanisms, while Hudi provides high ingestion throughput for incremental and streaming workloads and high ingestion throughput. The suggested framework enables a one-to-one mapping of workload types to suitable table formats, thereby enhancing practical Lakehouse deployment efforts. Therefore, this work provides a reproducible, expandable benchmark model that helps data engineers and architects choose suitable table formats, tune for better performance, and achieve future-proof scalability in the most versatile hybrid data lake environments.
 
Keywords—hybrid table formats, Lakehouse benchmarking, functional capability score, Apache Iceberg Hudi Delta, data pipeline evaluation

Cite: Srinivas Lakkireddy, "Hybrid Table Formats for Lakehouse Systems: A Functional Benchmark of Iceberg, Hudi, and Delta," Journal of Advances in Information Technology, Vol. 17, No. 3, pp. 519-533, 2026. doi: 10.12720/jait.17.3.519-533

Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions