Home > Published Issues > 2026 > Volume 17, No. 6, 2026 >
JAIT 2026 Vol.17(6): 1188-1210
doi: 10.12720/jait.17.6.1188-1210

Accelerated Quantum Algorithm Prototyping: Modular Simulation and Noise Modelling with CUDA Support

Elmin Marevac 1, Esad Kadušić 2, Nataša Živić 3,*, and Christoph Ruland 4
1. Polytechnic Faculty, University of Zenica, Zenica, Bosnia and Herzegovina
2. Faculty of Educational Sciences, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
3. Faculty of Digital Transformation, Leipzig University of Applied Sciences, Leipzig, Germany
4. Department of Electrical Engineering and Computer Science, University of Siegen, Siegen, Germany
Email: elmin.marevac@unze.ba (E.M.); ekadusic@pf.unsa.ba (E.K.); natasa.zivic@htwk-leipzig.de (N.Z.); christoph.ruland@uni-siegen.de (C.R.)
*Corresponding author

Manuscript received November 17, 2025; revised January 28, 2026; accepted March 17, 2026; published June 26, 2026.

Abstract—This work presents a portable, modular quantum-circuit emulator designed to accelerate algorithm prototyping by leveraging Central Processing Unit (CPU) and Compute Unified Device Architecture (CUDA)-enabled Graphic Processing Unit (GPU) backends for dense linear-algebra workloads. The emulator provides a lightweight backend abstraction that unifies NumPy and CuPy, supports both statevector and density-matrix representations, and implements efficient multi-target unitary embedding and Kraus-map noise channels. These capabilities enable rapid exploration of quantum algorithms that require full 2n × 2n operators, such as quantum phase estimation, full Quantum Fourier Transform (QFT), and noise-aware variational circuits, while minimizing host/device transfer and preserving numerical parity between backends. Implementation strategies are described for embedding single- and multi-qubit gates, parameterized rotations, and controlled unitaries, along with an “apply_unitary_on_targets” routine that employs reshape/matrix-multiply/reshape patterns to map arbitrary target sets to dense GPU kernels for high throughput. Using a curated benchmark suite of dense quantum workloads and microbenchmarks, including large matrix multiplies and Fast Fourier Transforms (FFT), performance is quantified across a range of qubit counts and precisions. The results demonstrate substantial speedups on modern NVIDIA GPUs for workloads dominated by dense linear algebra, identify cross-over points where GPU acceleration becomes beneficial, and analyse memory/precision trade-offs for complex64 versus complex128. The presented emulator lowers the barrier for noise-aware algorithm prototyping by combining practical software ergonomics with GPU performance, enabling faster iteration on algorithm design and noise-mitigation strategies on commodity single-node hardware. While the abstraction layer ensures portability, achieving peak performance currently requires hardware-specific tuning—a limitation we address through future automation strategies.
 
Keywords—quantum circuit simulation, Graphic Processing Unit (GPU) acceleration, statevector simulation, density-matrix simulation, quantum noise modelling, Kraus map

Cite: Elmin Marevac, Esad Kadušić, Nataša Živić, and Christoph Ruland, "Accelerated Quantum Algorithm Prototyping: Modular Simulation and Noise Modelling with CUDA Support," Journal of Advances in Information Technology, Vol. 17, No. 6, pp. 1188-1210, 2026. doi: 10.12720/jait.17.6.1188-1210

Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions