Many scientific and technological developments were achieved in 2021. On the scientific front, they include the implementation of single precision in the Integrated Forecasting System (IFS) and the increase in vertical levels in ensemble forecasts made possible by it. In a second upgrade in 2021, ‘all-sky’ data assimilation was extended to some of the most powerful observations assimilated into the IFS to achieve a good estimate of the current state of the Earth system.
Machine learning made big strides in emulating components of the IFS, such as the gravity wave drag parametrization scheme. We also began to ingest data from Saildrone uncrewed surface vehicles to better understand the behaviour of the Gulf Stream, and we applied the Ensemble of Data Assimilations to the ocean. Finally, we started to use more radiosonde data, and we made additional efforts to assimilate spaceborne radio occultation data.
ECMWF’s supercomputing facility is at the core of our operational and research activities and is upgraded typically every four or five years. In 2021, we set up the new Atos high-performance computing facility (HPCF) in Bologna, Italy, while the previous HPCF continued to be used to produce forecasts.
Progress was also made in the Scalability Programme, set up to ensure we can exploit the full potential of future computing architectures. A project helping to prepare weather prediction for the exascale concluded. Tests were also performed on the use of half precision in the IFS, and progress was made on the European Weather Cloud, a federated cloud computing infrastructure focused on meteorological data, created jointly with EUMETSAT.
This was the first year implementing our new Strategy for the period 2021–2030 as agreed by Member States. A key goal in the science and technology area is to use advanced high-performance computing, big data and artificial intelligence methodologies to continue developing our models into a digital twin of the Earth with a breakthrough in realism. This work will now also contribute to the new EU Destination Earth initiative that was launched at the end of 2021.
Single precision frees up memory and increases processor speeds.
From double precision to single precision
The year saw a major change in the precision of the calculations made in the IFS. Many of the previous ‘double precision’ calculations were replaced by ‘single precision’ calculations, which are computationally less demanding. In double precision, each number is stored using 64 bits of memory. This is often more precise than required when observational errors and model approximations are considered. Single precision, in which each number is stored with 32 bits of memory, frees up memory and increases processor speeds.
In an upgrade of the forecasting system on 11 May 2021, single precision was used both in high-resolution forecasts (HRES) and in ensemble forecasts (ENS). Double precision was still used throughout the process which determines the initial conditions for each forecast. Some calculations in the forecast also still required double precision, but these represented a very small part of the total computational load.
The goal of the implementation of single precision was neutrality in HRES performance scores, as well as major computational cost savings. This was indeed achieved. The cost savings enabled an increase in the number of model levels from 91 to 137 in ENS forecasts.
An illustration of the neutrality in HRES scores is given by track forecasts of Hurricane Laura in August 2020. While agreement cannot be perfect for a chaotic system, the medium-range track differences between single and double precision are much smaller than the spread of the ensemble, which represents the impacts of initial and model uncertainty.
More ensemble vertical levels
The number of vertical levels in ensemble forecasts (ENS) was increased from 91 to 137 in the upgrade of the IFS implemented on 11 May. The 51 members of ENS thus reached the same number of vertical levels as high-resolution forecasts (HRES). The change led to statistically significant improvements to many ENS performance scores of about 0.5–2%.
The change was made without any additional demands on the high-performance computing system. This is because at the same time most of the calculations for HRES and ENS forecasts changed from double precision to single precision.
A particular improvement relates to stratospheric temperature scores, which went up by 5–20%. This is partly due to a weaker growth of temperature biases. The reason is that the ENS can better resolve gravity waves in the vertical. The figure shows this improvement at day 10, but it persists into the extended range.
Wider all-sky data assimilation
There has been a trend in recent years to use satellite data in all-sky conditions, in other words including in cloudy and rainy situations. In an upgrade of the IFS to Cycle 47r3 in October 2021, the all-sky strategy was extended to AMSU-A microwave temperature sounding observations.
These observations have been among the most powerful assimilated into the IFS to achieve a good estimate of the current state of the Earth system. The upgrade to all-sky conditions led to an improved fit of that estimate to independent observations, and to an improvement in forecast scores.
An example of the filling of data gaps in cloudy regions is provided by the case of Hurricane Humberto. Coverage is improved in the immediate vicinity of Humberto, which in the figure is indicated by the white circle. It is also more extensive in the wider region, which is likely to be important for influencing Humberto’s subsequent track.
IFS Cycle 47r3 also introduced a wide-ranging moist physics upgrade, which has been described in the previous Annual Report.
Machine learning to emulate parts of the IFS
First steps were taken at ECMWF in 2021 to emulate components of the IFS using machine learning techniques. Machine learning tools can learn to represent complex tasks and dynamics from a large amount of data. Neural networks are particularly promising. They learn by adjusting the strength of connections between a set of neurons during a training period.
Such networks were used to emulate the radiation scheme ecRad in collaboration with the technology company NVIDIA, and the gravity wave drag parametrization scheme in collaboration with the University of Oxford. For example, ECMWF successfully built an emulator for the representation of non-orographic gravity waves.
The emulator was not only faster – up to ten times if graphics processing units (GPUs) are used – but also better. This was because it was trained from a version of the parametrization scheme with higher fidelity compared to the default scheme used in operations (see the figure). The emulator was also used to build so-called tangent-linear and adjoint versions of it. These were used successfully in data assimilation experiments.
Saildrone to help ECMWF understand the Gulf Stream
We began to ingest data from Saildrone uncrewed surface vehicles (USVs) in 2021 to better understand the behaviour of the Gulf Stream and its influence on weather. Under an agreement with Google and Saildrone Inc., three solar-powered USVs began to cover a part of the Gulf Stream that is crucial for forecasts.
One of them was stationed east of the US states of North Carolina and Virginia, where the current is relatively narrow, a second was stationed mid-stream, and the third was stationed downstream off the Grand Banks of Newfoundland where the current broadens, meanders, and breaks into eddies. The three USVs were set to sail back and forth across the current to capture as many ocean features as possible.
Real-time measurements of the atmosphere and ocean conditions to a depth of 100 metres were going to provide detailed data to support the modelling of this powerful ocean current. The Gulf Stream in the northwest Atlantic Ocean is important to ECMWF because of the magnitude of ocean temperature errors caused by mispositioning it. The goal is to improve weather forecasts at all timescales, from medium-range to extended-range forecasts.
Ensemble of Data Assimilations in the ocean
A project to improve ocean model initialisation for the EU-funded Copernicus Climate Change Service (C3S) run by ECMWF ended in July 2021. It resulted in a big improvement of ocean data assimilation capabilities at ECMWF. The basic idea was to apply the Ensemble of Data Assimilations (EDA) to the ocean rather than just to the atmosphere. Experts from ECMWF Member States as well as ECMWF worked on the project.
The classical EDA helps to determine the initial conditions for ECMWF’s ensemble forecasts and its higher-resolution deterministic forecast. The ocean work aimed in particular to develop efficient ensemble-based models of the errors of short-term ocean forecasts. An example of the results can be seen in the figure. The assimilation of sea-surface height was also improved, and capabilities to assimilate sea-surface temperature were developed.
These developments will enhance the quality of service of C3S products and ECMWF operations. They will also be the basis for the ocean data assimilation used in OCEAN6 and ERA6, the next generation of ocean and coupled reanalysis.
More radiosonde data
In May 2021, a number of ships started reporting radiosonde descent data. Most of these were from the European Automated Shipboard Aerological Programme (ASAP) in the North Atlantic (see the map). Four of these ships were using RS41 radiosondes, and the descent quality looked good. The data are useful to help determine the initial conditions of weather forecasts.
ECMWF started using the data operationally on 8 September 2021. Data use is restricted to pressures greater than 150 hPa, as for land stations, to avoid very fast fall rates. These radiosondes have pressure sensors but not parachutes. They use smaller balloons than most land stations, and the temperature bias problems at upper levels are slightly less because they do not go as high.
On average, radiosonde profiles from ships have more impact than land profiles because they are in data-sparse regions. At land stations, the descent reports often stop some kilometres above the surface due to hills blocking the radiosonde signal, but for ships the descent reports can get very close to the surface.
Assimilating Spire and COSMIC-2 data into the IFS
During the COVID-19 pandemic, additional efforts were made to assimilate spaceborne radio occultation data. Such data are sent by Global Navigation Satellite System (GNSS) satellites and measured with a receiver on a satellite in Low Earth Orbit.
The path of the radio signal is bent as a result of refractive-index gradients in the atmosphere. This makes it possible to derive information on temperature and humidity profiles.
A study of assimilating GNSS radio occultation data from commercial data provider Spire Global and from COSMIC-2 satellites into the IFS was conducted. It found that assimilating the data improves medium-range and short-range forecasts.
The results were achieved by running a system of observing system experiments (OSEs). In addition, we ran corresponding Ensemble of Data Assimilations (EDA) experiments. The experiments investigated the relationship between EDA spread estimates and OSE short-range forecast error statistics. A reasonably good agreement between the EDA spread and the OSE error statistics was found, suggesting that the EDA technique is a useful method for assessing future observing systems.
High-performance computing: the Atos system
ECMWF’s world-class high-performance computing facility (HPCF) is at the core of its operational and research activities and is upgraded typically every four or five years.
At the start of 2020, we signed a contract with Atos worth over 80 million euros for a new facility comprising four Atos Sequana XH2000 clusters. It will deliver about five times the performance of the current system, allowing us to run higher-resolution ensemble forecasts to improve the prediction of extreme weather events significantly ahead of time. The Atos clusters were installed in our new data centre in Bologna, Italy, during 2021.
The Test and Early Migration System (TEMS) component of the Atos system was initially installed in the Reading data centre, to provide a platform for developers through the second half of 2020 and first half of 2021. This system was generally available to internal and external users of the HPCF and ECGATE. Several training courses were provided by Atos and ECMWF staff to introduce users to the new system.
By the end of 2021, Atos had completed delivery of the hardware components of the new HPCF, building three of the clusters and all the storage systems. The remaining final cluster was on schedule with only the cabling of the high-performance interconnect remaining.
In August, one of the clusters was used for the Site Functional Test. The testing was to validate performance targets on the benchmark suite that Atos had committed to in their plan, including demonstrating significant improvements to the metadata performance of the Lustre file systems. This stage identified two additional issues with the Lustre file system, one of which was resolved in 2021 while the other remained under live investigation. The user acceptance test phase started in August 2021 and was continuing at the end of the year.
Forecast production was uninterrupted during the COVID-19 pandemic.
High-performance computing: the 2021 system
The Cray XC40 HPCF continued to provide a good and stable service in 2021 with excellent availability on both systems of over 99.9%. Forecast production was uninterrupted during the COVID-19 pandemic.
Despite this overall highly satisfactory performance, there were some incidents with the Cray machines, reflecting the age of the systems. A major problem occurred in June 2021, which led to a severe delay in the operational suite. The issue was caused by a hardware failure on one of the research filesystems, which should have had no impact on the operational forecast suite. However, during the running of the Ensemble of Data Assimilations, operational jobs began to overrun significantly. To resolve this required a full reboot of the system. Recognising the operational risks, monitoring on the system was enhanced to detect similar situations.
Data Handling System
The Data Handling System (DHS) provided a generally reliable service over the year, with a data archive of 442 petabytes of primary data and 166 petabytes in the secondary data store at the end of the year. With the management of significant volumes of data, some problems were experienced with the tape libraries that were visible in the MARS (Meteorological Archival and Retrieval System) and ECFS (ECMWF’s File Storage) services.
The testing of Spectra Logic libraries was completed with mixed success. In the Bologna data centre, we were going to continue with a mixed IBM/Spectra Logic environment, primarily using shorter IBM libraries, with the Spectra Logic tape library retained in its role as a production library for primary data and to monitor the development of software.
Data centre infrastructure
Due to the design and management of the cooling service, problems with the data centre infrastructure did not impact operations. A major failure to one of the chillers during this time was to a compressor, which required a complete rebuild. The diesel rotary uninterruptible power supply (DRUPS) machine KS3 also suffered from a failure causing the machine to switch to bypass. The repair was carried out and full operations were restored in two days without impacts on service delivery.
The Hybrid2024 project will prepare the IFS for HPC accelerators, with a vision ultimately of a full accelerator-enabled multi-architecture IFS.
The Hybrid2024 project
During the first half of 2021, the Hybrid2024 project was developed as part of the Centre’s Scalability Programme. This programme aims to adapt weather prediction codes to emerging computing paradigms.
The Hybrid2024 project aims to prepare the IFS for HPC accelerators. Emphasis was placed on creating a full accelerator-enabled multi-architecture IFS in which scientists can develop efficiently. At the same time, the aim is to ensure that the code is portable and performant on current and future accelerator-based architectures. The current IFS is heavily optimised for deep-cache central processing unit (CPU) architectures and needs a significant re-design to take advantage of technologies like graphics processing units (GPUs). This will be achieved through:
- developing programming approaches and tooling for a multi-architecture paradigm
- restructuring the code and supporting infrastructure accordingly
- demonstrating the IFS on available GPU technologies in advance of the next HPC procurement; and
- monitoring HPC accelerator development beyond current GPU technology.
The IFS restructuring is based on the concept of using build-time specialisation to bridge the gap between a single, maintainable source base that scientists can develop efficiently, and code that is portable and performant on current and future accelerator-based architectures. Individual components of the code will be adapted into separate libraries with clean Application Programming Interfaces (APIs), which ultimately separates scientific developments from the layout and placement of data in memory. Then, source-to-source code generation via the in-house Loki tool will be used to produce highly optimised, bespoke kernels for desired target architectures.
Hybrid2024 has many synergies and links, most obviously with IFS-ARPEGE, through which significant amounts of code are shared between ECMWF and Météo-France and in turn with the many members of the wider ACCORD consortium. Close coordination was going to ensure continued alignment of the major developments to the code. The primary role of Hybrid2024 is to coordinate the technical development of accelerator capabilities between different projects.
The project is targeting a significant re-design of the code and supporting infrastructure. It aims to increase the flexibility of the IFS through the use of accelerator-enabled libraries and data structures, as well as build-time specialisation through automated source-to-source translation. Hybrid2024 aims to develop, maintain and evaluate IFS accelerator capabilities on available GPU technologies. It will prepare the ground for the next HPC procurement.
The Atlas software library
Atlas is a software library developed and maintained at ECMWF for the purpose of abstracting complex grid data structures, and associated parallelisation concerns. It has been made publicly available on the Github software platform. During recent years, we have seen increasing interest from the wider numerical weather prediction (NWP) community, and ECMWF Member States, in adopting Atlas in their software. Moreover, Member States have started to collaborate with ECMWF to improve or implement missing features that stand in the way of their adoption of Atlas.
The ORCA tripolar grid is the ocean grid used by the NEMO ocean model. Due to the very specialised nature of the ORCA grid, special functionality was added to Atlas as an external library, ‘atlas-orca’, using a newly developed Plugin architecture. This serves as a great example of Atlas’s flexible and extensible software design.
The figure shows the flexibility of Atlas by using its own native functions to represent and remap a potential temperature field from the ORCA grid to a standard latitude-longitude grid.
First order and second order conservative remapping methods between arbitrary spherical meshes supported by the Atlas library were developed. Further work to finalise this effort and incorporate it in the IFS infrastructure was due to be completed. The implemented method works well with various meshes and, in general, unstructured and structured grids. Second order remapping much better preserves the smoothness of the function in higher-resolution target meshes. Experiments show that the second order method is ten times more accurate in the remapping step.
Novel programming approaches
During the second half of 2021, work continued on developing novel programming approaches and tools for a multi-architecture paradigm. Significant progress was made towards GPU-enabled physics parametrizations. Developments were also under way to further utilise accelerator-enabled data structures and to integrate GPU-enabled versions of the spectral transform library (TRANS) into the IFS release cycle. The goal was to prepare for and evaluate hybrid CPU–GPU execution. These efforts were also supported by the Atos/ECMWF Centre of Excellence and were planned to be further boosted by Destination Earth, as well as various European projects.
The IFS spectral transforms library was the first to see a working vendor-specific port developed. In the previous year, we created a first hybrid version of RAPS18 IFS, which was able to perform a low-resolution forecast run by computing the spectral transform on NVIDIA GPUs. In 2021, we were able to run this version at 1 km resolution on the Summit supercomputer.
Rewriting the finite-volume module
A comprehensive rewrite was in progress in 2021 of the finite-volume module (FVM) non-hydrostatic dynamical core that is under development. It was based around the GT4Py (GridTools for Python) domain-specific language (DSL) framework, developed by our close partners at CSCS (Swiss National Supercomputing Centre) and ETH Zurich.
An important milestone reached was the GT4Py DSL implementation of the FVM in a 3D structured grid with an additional limited-area model (LAM) configuration. The adopted GT4Py enabled clear and flexible programming with emphasis on the physical and numerical model aspects. Both GPU and CPU architectures can be targeted by the same Python-like frontend code. In addition to portability and targeted optimisation, this design promised enhanced scientific productivity.
Changing to single precision in the medium-range forecast model allowed a higher vertical resolution for no extra cost, resulting in a boost in forecast scores. By gaining access to the world’s top two supercomputers, Fugaku and Summit, ECMWF was able to test half precision natively for the first time. Tests of spectral transform code on Fugaku indicated that half precision can be safely used up to the current operational high-resolution forecast.
The coupling model framework between the NEMO ocean model and the atmosphere and wave models was made precision agnostic, allowing fully single-precision coupled forecasts for the first time. Forecast testing on this coupled system started at a low resolution. Initial results were encouraging, with very minor differences in skill so far for extended-range ensemble forecasts. Preliminary scalability tests were carried out on the Atos Test and Early Migration System. At the operational 0.25° resolution, single precision resulted in a speed-up by the factor 1.7 on this system for the ocean part.
European Weather Cloud
In December 2018, ECMWF’s Council approved a pilot project to create, jointly with EUMETSAT, a federated cloud computing infrastructure focused on meteorological data. This European Weather Cloud was designed to continue to make the results of the Centre’s research and operations available to its Member States in the most appropriate form.
In late 2020, a Tender Evaluation Board (TEB) was formed in order to agree the strategic procurement approach for the necessary equipment/service. The TEB agreed to initially seek market feedback, and then to follow this up with an invitation to tender (ITT).
To this end, an invitation to pre-qualify was published. This included a number of minimum requirements, but also allowed responders to propose their own recommended technical solutions, as well as indicative pricing. Responders were advised that their responses would be evaluated for short-listing purposes for the subsequent ITT. A good number of responses were received, and in December 2021 ECMWF’s Council authorised ECMWF to issue a tender.