Skip to Main Content (Press Enter)

Logo UNICH
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Attività
  • Competenze

UNI-FIND
Logo UNICH

|

UNI-FIND

unich.it
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Attività
  • Competenze
  1. Pubblicazioni

Acceleration of the Relativistic Dirac–Kohn–Sham Method with GPU: A Pre-Exascale Implementation of BERTHA and PyBERTHA

Articolo
Data di Pubblicazione:
2025
Abstract:
In this paper, we present the recent advances in the computation of the Dirac-Kohn-Sham (DKS) method of the BERTHA code. We show here that the simple underlined structure of the FORTRAN code also favors efficient porting of the code to GPUs, leading to a particularly efficient hybrid CPU/GPU implementation (OpenMP/OpenACC), where the most computationally intensive part for DKS matrix evaluation (three-center two-electron integrals evaluated via the McMurchie-Davidson scheme) is efficiently offloaded to the GPU via compiler directives based on the OpenACC programming model. This scheme in combination with the use of a linear algebra library optimized for GPUs (cuBLAS, cuSOLVER) significantly accelerates the DKS calculations. In addition, the low-level integral kernel developed here at FORTRAN level was used to port our real-time DKS (RT-TDDKS) implementation based on Python (PyBERTHART) for the utilization of the GPU. The results obtained on the new Tier-0 EuroHPC supercomputer (LEONARDO) of the CINECA Supercomputing Centre with a single NVIDIA A100 card are very satisfactory. We achieve a speedup up to 30 for Au16 in a single-point DKS energy calculation and up to 10 for the Au8 systems in an RT-TDDKS calculation, compared to our OpenMP (i.e., CPU only) parallel implementation (with 32 cores). The approach presented here is very general and, to our knowledge, represents the first port of a Python API to GPUs based on a FORTRAN kernel for the evaluation of two-electron integrals. The implementation is currently limited to the use of a single GPU accelerator, but future paths to an actual exascale implementation are discussed.
Tipologia CRIS:
1.1 Articolo in rivista
Elenco autori:
Storchi, Loriano; Bellentani, Laura; Hammond, Jeff; Orlandini, Sergio; Pacifici, Leonardo; Antonini, Nicoló; Belpassi, Leonardo
Autori di Ateneo:
STORCHI LORIANO
Link alla scheda completa:
https://ricerca.unich.it/handle/11564/886516
Pubblicato in:
JOURNAL OF CHEMICAL THEORY AND COMPUTATION
Journal
Progetto:
Parallelizzazione e Porting su GPU di Codici Scientifici
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.2.0