The Centre’s Strategy says that by 2025 ensemble forecasts should use a much smaller grid spacing than the current 18 km. Data assimilation methods will need to follow suit to provide accurate initial conditions at such scales. To make this possible, the Centre’s next high-performance computing facility (HPCF) will have to be significantly more powerful than the current one. Much of the groundwork for a new HPCF was laid in 2018. The process started when Council approved the budget for the next generation of supercomputers at its December 2017 session.

Sustaining high-performance computing

© agsandrew/iStock/Getty Images Plus.

The project, known as HPC2020, kicked off with a request for information to vendors as part of the business case aiming to ensure that we will get the most suitable technology and best value for money. We also started to set up a framework contract to purchase the computer storage systems that will support both the headquarters in Reading and the new data centre in Bologna. The invitation to tender for ECMWF’s next HPCF was released on 12 November 2018.

The increase in power will have to go hand in hand with efficiency gains resulting from the Scalability Programme. It aims to promote an integrated approach to code development with active participation from ECMWF Member States – through membership in programme and project boards, and direct partnership in internal and externally funded projects. The EU-funded ESCAPE project is part of it and was brought to a successful conclusion in 2018, while other EU-funded scalability-related projects got under way, such as ESCAPE-2, EPiGRAM-HS and MAESTRO.

As part of ESCAPE, 2018 saw demonstrations of efficiency gains for selected model components (‘dwarfs’) on a range of processor architectures (CPU, GPU, optical processors). It was also possible to demonstrate efficiency gains for product generation on Intel 3d-xpoint NVRAM technology. After four years of progress, it was time to envisage the next stage for this ambitious programme, and its different strands were brought together under the banner ‘ExtremeEarth’. This programme will build upon expertise from its consortium members as well as national and international partners to deliver application-orientated solutions to the environmental extremes affecting the planet today. The aim is to use extreme computing to tackle environmental extremes and their impacts on society.

In the summer, more than 90 data enthusiasts converged on the Centre for a hackathon weekend to develop innovative climate data applications. They tested the capabilities of the Climate Data Store (CDS), a facility that provides free and open access to climate data and information. The CDS has been developed by the EU-funded Copernicus Climate Change Service (C3S) implemented by ECMWF. It was publicly released in June 2018.

In November 2018, the US Department of Energy’s (DOE) Office of Science announced the 62 computational research projects to which it awarded grants as part of its Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program. ECMWF was among the INCITE 2019 award winners, with a project called ‘Unprecedented scales with ECMWF’s medium-range weather prediction model’.

In 2018, ECMWF reviewed the technologies it uses to archive data. As a result, we successfully ran a competitive tender for the procurement of a new tape library complex, which will provide the foundation for our data archive.

HPC workshop

The 18th workshop on high-performance computing in meteorology took place at ECMWF in September 2018. This workshop looked especially at scalability and I/O of weather and climate applications on HPC systems. Scalability was the key theme because we are approaching the completion of the first milestones of the Scalability Programme and, with the upcoming procurement for our next supercomputer, new technologies offering better ways to achieve scalability become important.

The HPC workshop provided a unique forum to bring together operational centres concerned with running cost-effective forecasting systems on affordable HPC infrastructures; research teams exploring cutting-edge methodologies and novel technologies for future solutions; and HPC industry representatives interested in providing the most suitable technological solutions for an application community with enormous socio-economic impact: weather and climate prediction. The workshop programme included programmatic overview talks; expert presentations; and panel discussions focusing on key topics, such as the European Roadmap towards Exascale and the Convergence of HPC and the Cloud.

Workshop 2018
Workshop 2018Vendor exhibition at the ECMWF HPC workshop.

A significant step towards exascale computation: NEXTGenIO

One of the major roadblocks to achieving exascale computing (1000x faster than current petascale systems) is the I/O bottleneck. Current systems are capable of processing data quickly, but speeds are limited by how fast they can read and write data. This represents a significant loss of time and energy. Being able to widen, and ultimately eliminate, this bottleneck would significantly increase the performance and efficiency of HPC systems.

The EU-funded NEXTGenIO project (2015–2019) aims to solve this problem by bridging the gap between memory and storage using IntelR OptaneTM Data Center Persistent Memory Modules, which will sit between conventional memory and disk storage. NEXTGenIO has been working on the required hardware and software to exploit this new memory technology. The goal is to build a system with 100x faster I/O than current HPC systems, a significant step towards exascale computing.

In 2018, ECMWF’s participation in NEXTGenIO enabled some interesting developments that made it possible to refactor the I/O stack of ECMWF’s weather model, making it ready for the future.

ECMWF developed the workflow simulator Kronos Workload Simulator, a key part of the software package for the NEXTGenIO system. Kronos generates and executes workloads representative of the real-life computational workloads of HPC centres in a highly controlled and easily portable way. Furthermore, it aims to generate these workloads automatically, based on analysis of data collected from measured operational workloads. This software was used in 2018 to assess the requirements for the next HPC procurement.

2018 also saw the fifth version of the Fields Database (FDB), a software library and internally provided service, used as part of the NWP software stack. FDB5 became part of the processing chain for the Boundary Conditions programme in 2018 and it is expected to be fully implemented in the next IFS cycle upgrade 46r1 in 2019.

NEXTGenIO has facilitated ECMWF’s access to state-of-the-art technology. It enabled ECMWF to participate in the design process and to have access to the first prototype of an HPC system with NVRAM. This was built by Fujitsu and it is currently hosted in Scotland at the Edinburgh Parallel Computing Centre (EPCC). This system has already been successfully tested to produce a six-member ensemble forecast.

The NEXTGenIO project has received funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement no. 671951.

NEXTGenIO
NEXTGenIOPerformance of the FDB5 Weather & Climate distributed object-store running on the Fujitsu built NextGenIO prototype, using Intel’s DCPMM NVRAM. Throughput in GiB/s measured with 8, 16 and 32 servers compared with the current software and hardware technology.

Data management at ECMWF

ECMWF produces and archives very large volumes of data. Currently the only cost-effective solution available to archive and retrieve such volumes of data is tape. This technology is limited. Therefore, a collaborative effort across ECMWF, coordinated by a Data Steward, led to a reduction in primary data in 2018. By the end of the year, the data stored at ECMWF reached 417 petabytes, while without this effort it would have reached 474 petabytes. Even though the data continued to grow in 2018, we made that growth more manageable. The intention is to keep deleting data which is not useful on a regular basis. At the same time, ECMWF is working closely with scientists to promote best practices with data storage so the whole community can benefit.

We are working closely with users of the archive, to enable them to classify the purpose and examine the usage of their archived data. This allows data which has become redundant to be easily identified for deletion on a regular basis.

DHS size and monthly growth
DHS size and monthly growthData archive growth, showing the impact of the deletion of data (negative values show the volume of deleted data) MARS is our main archive, comprising forecasts, analyses, climate reanalyses, reforecasts and multi-model datasets. ECFS is ECMWF’s File Storage system, a file oriented client-server application, providing facilities to archive and retrieve files between local workstation or servers and the Data Handling System (DHS). “Second copy” are backups of important data from both systems.