The scientific advances required to meet the critical need for more accurate forecasts, particularly of extreme weather, call for an increase in supercomputing power of several orders of magnitude (exascale computing), coupled with much greater energy efficiency. To address this challenge, ECMWF embarked in 2013 on a major rethink of its model development, known as the Scalability Programme.
A key idea at the heart of this collaborative programme is to prepare the Centre’s model, the Integrated Forecasting System, for new computing architectures towards exascale machines. Consultation and collaboration with Member States and partners from the public and commercial sectors are vital, from workshops to define the programme scope through to development and testing.
The Scalability Programme comprises a series of projects, one of which, the EU-funded ESCAPE project coordinated by ECMWF, came to an end in 2018. ESCAPE stands for Energy-efficient Scalable Algorithms for Weather Prediction at Exascale. The findings presented below are extracted from the end of project report produced by the ESCAPE team.
Exascale computing systems are being developed, but current numerical weather prediction (NWP) models cannot fully exploit them, because the NWP software is not adequately adapted to make the most of the faster and more energy-efficient hardware: the so-called “scalability” challenge. Changes are needed throughout the entire NWP processing chain. It is a huge task, but the ESCAPE project, which ran from 2015 to 2018, has achieved major steps forward.
Key to progress has been dividing the huge volume of code within the ECMWF forecasting model into smaller and more manageable elements known as “dwarfs”. Each dwarf performs a particular function within the model, such as modelling cloud micro-physics, and comes with specific computational patterns for accessing processor memory and data communication.
The creation of dwarfs is a prerequisite for any subsequent co-design, optimisation, and adaptation efforts. The dwarfs have enabled high-performance computing centres, research groups and hardware vendors to focus on specific aspects of performance for which code restructuring and adaptation to novel processor architectures is more straightforward.
Teams worked to adapt and optimise the dwarfs for different types of Intel CPU and NVIDIA GPU processors. A new technique was also explored with an optical processor which encodes information into a laser beam by adjusting the magnitude and phase in each point of the beam. It is particularly suited to performing Fourier transformations within the model’s dynamics scheme.
Efficiency gains of up to 40% were achieved for spectral transforms (fundamental to the model dynamics scheme) on CPUs. Code optimisation for GPUs delivered speed-up factors of about 10 to 50 on a single node, and by a factor of 2 to 3 when deployed on multiple GPUs with NVSwitch interconnect (used to communicate between GPUs).
Domain-specific languages (DSLs) were also demonstrated as a very promising tool to enable good performance on multiple architectures with a single code. However, designing a user-friendly DSL that is shared by many dwarfs, whilst delivering good performance on each architecture, is a challenge.
ECMWF was successful in securing funding to continue the pioneering work of the ESCAPE project. ESCAPE-2 began in 2018 and will run until 2021. It will extend the work on dwarfs to other models, such as the German national meteorological service’s ICON model and the community ocean model NEMO. It will also develop benchmarks that represent the computing and data handling patterns of weather and climate models more realistically, and are thus more suitable for assessing the performance of future HPC systems.
Importantly, ESCAPE-2 will look at re-assembling the dwarfs created in ESCAPE whilst maintaining the efficiency gains obtained so far. Both projects incorporate the expertise of leading European regional forecasting consortia, university research, experienced high-performance computing centres and hardware vendors.
The ESCAPE and ESCAPE-2 projects have received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no 671627 and grant agreement no 800897, respectively.