Home > Published Issues > 2026 > Volume 17, No. 6, 2026 >
JAIT 2026 Vol.17(6): 1113-1129
doi: 10.12720/jait.17.6.1113-1129

A Comparative Analysis of Apache Iceberg, Delta Lake, and Apache Hudi: Architecture, Performance, and Use-Case Suitability

Srinivas Lakkireddy
Independent Researcher, Buffalo Grove, USA
Email: reachlakkireddy@gmail.com

Manuscript received August 29, 2025; revised December 23, 2025; accepted February 24, 2026; published June 22, 2026.

Abstract—A new architecture called data lakehouse has been developed to combine the capabilities of data lakes with the benefits of data warehouses, offering the scalability of data lakes alongside the ability to achieve transactional consistency typically associated with data warehouses. The foundation of these architectures is built on open table formats, such as Apache Iceberg, Delta Lake, and Apache Hudi, to support Atomicity, Consistency, Isolation, Durability (ACID) transactions, schema evolution, and time travel on distributed datasets. Current comparative works on these formats are limited, often comparing them only on specific benchmarks or in an approach that is not driven by ephemeral scenarios. This leads to confusion among architects and data engineers regarding the choice of the correct format for various workload needs. To address this gap, the work introduces a unified benchmarking and suitability evaluation framework that compares across architectural paradigms, including ingestion throughput, query latency, write amplification, schema evolution performance, and uses synthetic and real-world use cases with respect to architectural paradigms. It proposes a weighted scoring model to calculate aggregate suitability scores and two algorithms that facilitate the evaluation and ranking in a scenario-oriented manner. All the experiments are performed on top of Apache Spark 3. Using both synthetic Transaction Processing Council-Decision Support (TPC-DS) and Internet of Things (IoT) workloads in batch, streaming, and slowly changing dimension Slowly Changing Dimension (SCD) scenarios. Apache Iceberg is shown to represent the best of the three in terms of low write amplification and effective schema evolution handling for batch Extract, Transform, Load (ETL) workloads (quantitatively). While Delta Lake’s ingestion and query performance is well-balanced for hybrid scenarios, Apache Hudi has a better approach in streaming use cases with high-throughput ingestion, specifically in the Merge-on-Read mode. While the existence of workload-dependent trade-offs, illustrated through the proposed framework, is of significance, it also enables practitioners through visual and quantitative decision-support tools. The results of the study can help improve the tuning of architectural decisions of lakehouses when used in production-size data environments.
 
Keywords—Apache Iceberg, Delta Lake, Apache Hudi, data lakehouse, benchmarking framework

Cite: Srinivas Lakkireddy, "A Comparative Analysis of Apache Iceberg, Delta Lake, and Apache Hudi: Architecture, Performance, and Use-Case Suitability," Journal of Advances in Information Technology, Vol. 17, No. 6, pp. 1113-1129, 2026. doi: 10.12720/jait.17.6.1113-1129

Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions