QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials

Paolo Giannozzi, Stefano Baroni, Nicola Bonini, Matteo Calandra, Roberto Car, Carlo Cavazzoni, Davide Ceresoli, G. Chiarotti, Matteo Cococcioni, Ismaïla Dabo, +23 more

Journal of Physics Condensed Matter·2009·Physics and Astronomy·28,377 citations

7 min read

Read the full paper at DOI or on arxiv

TL;DR

QUANTUM ESPRESSO is an open-source, modular DFT suite built around plane waves and pseudopotentials (NC, US, PAW) with periodic boundary conditions and supercell support.

Briefing Cornell Notes

Briefing

This paper does not present a new physics result so much as it documents and motivates a major piece of scientific infrastructure: QUANTUM ESPRESSO, an open-source, modular software suite for electronic-structure calculations and materials modeling. The central “research question” is therefore: how can the physics community best support reproducible, extensible, and high-performance density-functional theory (DFT) simulations for a broad user base, while enabling ongoing algorithmic innovation? This matters because DFT-based workflows are now a primary computational microscope for condensed-matter physics, chemistry, and materials science, and the bottleneck is often not only the underlying theory but also the software’s ability to (i) implement new methods correctly, (ii) scale to modern high-performance computing (HPC) systems, (iii) remain maintainable as features grow, and (iv) allow interoperability so that results are reproducible and extensions are feasible.

Within the broader field, the paper positions QUANTUM ESPRESSO as an alternative to proprietary or monolithic codes. The authors argue that innovation requires “total control” of the software for developers, but widespread adoption requires that the same codebase be robust, user-friendly, and maintainable for production use. They emphasize that modularity is essential for multiscale approaches and for combining different computational tasks (e.g., ground-state DFT, phonons, transport, spectroscopy) within a single interoperable ecosystem. The paper also situates the project within the open-source tradition (explicitly referencing the GNU/Linux model), where a community can validate, benchmark, and extend the software under a coordinated core.

Methodologically, the paper is a software engineering and system-design description rather than an empirical study. It lays out the suite’s design principles, the implemented computational methods, the data formats that enable interoperability, and the parallelization strategies that target massively parallel architectures. The “data sources” are the implemented numerical methods and libraries (e.g., BLAS, LAPACK, FFTW) and the code components (PWscf, CP, PHonon, etc.), rather than experimental or observational datasets. There is no sample size in the statistical sense; instead, the paper provides codebase scale and performance evidence via scalability demonstrations.

Key “findings” are therefore concrete engineering metrics and demonstrated scalability ranges. The authors report that the distribution (at the time of writing) contains about 310,000 lines of Fortran-90, 1,000 lines of Fortran-77, 1,000 lines of C, 2,000 lines of Tcl, and roughly 10,000 lines of documentation, plus external libraries such as FFTW, BLAS, LAPACK, and the iotk toolkit. Overall, the distribution is described as more than 3,000 files across about 200 directories, taking about 22 MB compressed. On the computational side, they claim that the main engines (PWscf and CP) can scale on massively parallel computers “up to thousands of processors,” and they provide specific scalability figures: for medium-size calculations, they show scalability on IBM BlueGene/P and SGI Altix; for large-scale calculations, they show wall time and speedup up to 4,096 processors for cases including PSIWAT (PWscf) and CNT (CP), with systems ranging from hundreds to over a thousand atoms (e.g., an Aβ-peptide fragment in water with 838 atoms and 2,311 electrons; a gold surface with thiols with 587 atoms and 2,552 electrons; and a porphyrin-functionalized nanotube with 1,532 atoms and 5,232 electrons). They also state that preliminary tests indicate scalability up to 65,536 cores using a partial OpenMP parallelization combined with MPI.

The paper’s most important technical contributions are the modular architecture and the interoperability mechanisms. QUANTUM ESPRESSO is built around DFT with plane waves and pseudopotentials, supporting norm-conserving (NC), ultrasoft (US), and projector-augmented wave (PAW) approaches. It provides multiple executables for different tasks: PWscf for self-consistent DFT using iterative diagonalization and Broyden mixing; CP for Car-Parrinello ab initio molecular dynamics; PHonon for density-functional perturbation theory (DFPT) including second- and third-order derivatives; atomic for generating and testing pseudopotentials; PWcond for ballistic conductance via Landauer–Büttiker scattering; GIPAW for NMR/EPR parameters; XSPECTRA for K-edge X-ray absorption spectra; Wannier90 as an interoperable component for maximally localized Wannier functions; and PostProc and PWgui for analysis and user-facing input construction.

Interoperability is supported by a specific data-file strategy: simulation outputs are stored as a directory with a small “head” file using XML-like tags for metadata and small arrays, while large arrays (e.g., plane-wave coefficients, charge density, potentials) are stored in separate files referenced by links. For pseudopotentials, the paper introduces the Unified Pseudopotential File (UPF) format, described as XML-like syntax, with converters from other pseudopotential formats.

Limitations are not framed as scientific uncertainties (since the paper is not testing a hypothesis), but the authors implicitly acknowledge constraints typical of DFT software. They note that scalability is limited by problem structure: the number of k-points bounds pool parallelization, and the number of electronic bands bounds linear-algebra parallelization. They also mention that some features are restricted to certain pseudopotential types (e.g., GIPAW currently restricted to NC PPs; third-order derivatives and Raman-coefficients implemented only for NC PPs at the time). Additionally, they state that CP is devised for systems where symmetry is absent or less useful, and thus CP does not exploit symmetry in the same way as PWscf.

Practical implications are central. The paper argues that QUANTUM ESPRESSO should be useful to (1) method developers who need modifiable production-quality code, (2) end users who need robust and efficient tools, and (3) educators who need accessible workflows. The authors describe educational deployment: training at SISSA for graduate students and regular use in undergraduate courses at MIT, including a web-based interface that removes the need for users to manage Unix/Linux job queues. They also describe a cloud/virtual-machine test using Amazon EC2 (Spring 2009) and a broader vision of distributed and service-oriented computing, exemplified by the Vlab cyber-infrastructure, which orchestrates large parameter sweeps (with 10^2–10^4 points in pressure/strain/composition/phonon q-space) using QUANTUM ESPRESSO as a backend.

Who should care? Practically, condensed-matter physicists, materials scientists, and computational chemists who rely on DFT workflows should care because the paper documents a widely adopted platform that supports many simulation modalities (structural optimization, MD, phonons, transport, spectroscopy) and is designed for reproducibility and extension. HPC practitioners should care because the paper details a hierarchical MPI parallelization scheme (image, pool, plane-wave, task-group, plus linear-algebra parallelization) and reports scalability up to thousands of processors and, preliminarily, tens of thousands of cores with OpenMP+MPI. Finally, software sustainability researchers and research administrators should care because the paper provides a concrete model for open, modular scientific software development under GPL with community-driven validation.

Overall, the paper’s core contribution is the articulation and documentation of QUANTUM ESPRESSO as an open-source, modular, interoperable, and HPC-oriented DFT suite, with quantified codebase scale and demonstrated parallel scalability, plus a roadmap for interoperability and excited-state extensions (TDDFT, GW, RPA correlation) and integration with third-party tools via qe-forge and shared data formats.

Cornell Notes

QUANTUM ESPRESSO is presented as an open-source, modular DFT-based software suite designed for reproducibility, extensibility, and high performance on massively parallel computers. The paper details core components (PWscf, CP, PHonon, etc.), interoperable data formats (including UPF pseudopotentials), and hierarchical MPI/OpenMP parallelization strategies, along with examples of scalability and community-driven development via qe-forge.

What problem does the paper aim to solve for the materials simulation community?

It argues that innovation in electronic-structure methods requires software that is both controllable by developers and reliable/user-friendly for production use, which is best achieved through modular, interoperable, open-source infrastructure.

What computational framework does QUANTUM ESPRESSO implement?

DFT using plane-wave basis sets and pseudopotentials, supporting norm-conserving, ultrasoft, and PAW representations, with periodic boundary conditions and supercell approaches.

What are the main software components and what does each do?

PWscf performs self-consistent DFT; CP performs Car-Parrinello ab initio molecular dynamics; PHonon performs DFPT for phonons and response properties; atomic generates/testing pseudopotentials; PWcond computes ballistic conductance; GIPAW computes NMR/EPR-related parameters; XSPECTRA computes K-edge X-ray absorption spectra; Wannier90 computes maximally localized Wannier functions; PostProc and PWgui support analysis and input building.

How does the project support interoperability between modules?

By using common input/output/work data formats and a directory-based simulation data structure with an XML-like “head” file plus linked large-array files; it also introduces UPF as a unified pseudopotential format with converters.

What parallelization strategy is used to achieve HPC performance?

A hierarchical MPI approach with multiple parallelization levels: image parallelization (NEB images), pool parallelization (k-points), plane-wave parallelization (real/reciprocal-space grids), task-group parallelization (FFT distribution), plus separate linear-algebra parallelization using custom algorithms and ScaLAPACK.

What evidence of scalability does the paper provide?

It reports demonstrated scaling up to thousands of processors (e.g., up to 4,096 processors in large-scale examples for PWscf and CP) and preliminary tests indicating scalability up to 65,536 cores using partial OpenMP combined with MPI.

How large is the software distribution described in the paper?

Approximately 310,000 lines of Fortran-90, 1,000 lines of Fortran-77, 1,000 lines of C, 2,000 lines of Tcl, plus documentation and thousands of files; it is described as taking about 22 MB compressed and containing over 3,000 files across about 200 directories.

What limitations or feature restrictions are acknowledged?

Scalability is bounded by problem structure (k-points, bands, algorithmic scaling). Some capabilities are restricted by pseudopotential type (e.g., GIPAW restricted to NC PPs; third-order derivatives and Raman coefficients implemented only for NC PPs at the time).

Review Questions

What design choices make QUANTUM ESPRESSO modular and interoperable, and how do the data formats (directory structure and UPF) support reproducibility?
Describe the hierarchical parallelization levels (image, pool, plane-wave, task-group, linear algebra) and explain what each level distributes.
Which QUANTUM ESPRESSO components correspond to (i) self-consistent ground-state DFT, (ii) ab initio MD, (iii) phonons/response, and (iv) spectroscopy/transport?
What scalability limits does the paper identify, and what performance evidence is provided (processor counts and example system sizes)?
How does the paper’s open-source governance model (GPL + qe-forge) relate to scientific reproducibility and method development?

Key Points

1
QUANTUM ESPRESSO is an open-source, modular DFT suite built around plane waves and pseudopotentials (NC, US, PAW) with periodic boundary conditions and supercell support.
2
The paper’s core contribution is software infrastructure: it emphasizes maintainability, extensibility, and interoperability so that development and production needs converge.
3
Interoperability is enabled by shared data formats, including a directory-based output structure with an XML-like “head” file and linked large-array files, plus the UPF unified pseudopotential format.
4
The distribution is large and production-oriented (reported at ~310,000 lines of Fortran-90, >3,000 files, ~22 MB compressed) and includes many executables for distinct physics tasks.
5
HPC performance is achieved via hierarchical MPI parallelization (image, pool, plane-wave, task-group) plus linear-algebra parallelization using ScaLAPACK/custom routines.
6
The paper reports scalability demonstrations up to thousands of processors (including examples up to 4,096) and preliminary scaling up to 65,536 cores with OpenMP+MPI.
7
The project extends beyond computation into education (web-based interfaces) and community-driven development via qe-forge and interoperability with third-party “quantum engines.”

Highlights

“QUANTUM ESPRESSO stands for opEn Source Package for Research in Electronic Structure, Simulation, and Optimization.”

“Optimal performance in parallel execution is achieved through the design of several parallelization levels… [PWscf and CP] may scale on massively parallel computers up to thousands of processors.”

“Overall the complete distribution includes more than 3000 files… and takes 22Mb in compressed format.”

“Preliminary tests on realistic physical systems demonstrate scalability up to 65536 cores, so far.”

“The current 4.1 version includes about 310,000 lines of Fortran-90 code…”

Topics

Computational physics
Density-functional theory (DFT)
Electronic-structure methods
Materials modeling
Pseudopotentials
HPC parallel computing
Software engineering for scientific computing
Phonons and lattice dynamics (DFPT)
Ab initio molecular dynamics
Quantum transport
Spectroscopy simulations
Wannier functions and tight-binding interpolation
Open-source scientific software

Mentioned

QUANTUM ESPRESSO
qe-forge
GNU/Linux
SourceForge
CVS
SVN
GenePattern
Amazon EC2
MPI
OpenMP
BLAS
LAPACK
FFTW
ScaLAPACK
iotk
UPF (Unified Pseudopotential File)
PWscf
CP
PHonon
atomic
PWcond
GIPAW
XSPECTRA
Wannier90
PostProc
PWgui
xcrysden
VMD
Vlab cyber-infrastructure
yambo
casino
want
SaX
dmft
qha
pwtk
GUIB
ScaLAPACK
Paolo Giannozzi
Stefano Baroni
Nicola Bonini
Matteo Calandra
Roberto Car
Carlo Cavazzoni
Davide Ceresoli
Guido L. Chiarotti
Matteo Cococcioni
Ismaila Dabo
Andrea Dal Corso
Stefano Fabris
Guido Fratesi
Stefano de Gironcoli
Ralph Gebauer
Uwe Gerstmann
Christos Gougoussis
Anton Kokalj
Michele Lazzeri
Layla Martin-Samos
Nicola Marzari
Francesco Mauri
Riccardo Mazzarello
Stefano Paolini
Alfredo Pasquarello
Lorenzo Paulatto
Carlo Sbraccia
Sandro Scandolo
Gabriele Sclauzero
Ari P. Seitsonen
Paolo Umari
Renata M. Wentzcovitch
DFT - Density-functional theory
KS - Kohn-Sham
PP - pseudopotential
NC - norm-conserving
US - ultrasoft
PAW - projector-augmented wave
GGA - generalized-gradient approximation
LDA - local-density approximation
DFPT - density-functional perturbation theory
MD - molecular dynamics
BO - Born-Oppenheimer
NPT/NVT/NVE - constant pressure/volume/temperature/energy ensembles
NEB - nudged elastic band
RPA - random-phase approximation
NMR - nuclear magnetic resonance
EPR - electronic paramagnetic resonance
XAS - X-ray absorption spectra
STM - scanning tunneling microscopy
DOS - density of states
PDOS - projected density of states
MPI - Message Passing Interface
FFT - fast Fourier transform
CPU - central processing unit
HPC - high-performance computing
OpenMP - Open Multi-Processing
ScaLAPACK - Scalable Linear Algebra Package
GUI - graphical user interface
UPF - Unified Pseudopotential File
HDF - Hierarchical Data Format
netCDF - network Common Data Form
TDDFT - time-dependent density-functional theory
GW - many-body perturbation theory in the GW approximation
SOA - service-oriented architecture
CI - cyber-infrastructure
EC2 - Elastic Compute Cloud
CVS - Concurrent Versions System
SVN - Subversion
SIC - self-interaction correction