In April the work related to scalability of the solver has reached the state, when we decided to run the first full scale test of the ESPRESO on the ORNL Titan supercomputer. In this test we have used only the CPU part of the machine and did not use the GPU accelerators. The physical problem we were solving was heat transfer or in other words the Laplace equation in 3D.
The method that allowed us to run such a large tests is the Hybrid Total FETI method that has been implemented into ESPRESO library under the umbrella of the EX2ACT (EXascale Algorithms and Advanced Computational Techniques) European project. For more information see the project website: www.exa2ct.eu.
The test has been performed using the 3D cube benchmark generator, which is fully parallel and is able to generate all matrices required by the Hybrid TFETI solver in several seconds. The problem has been decomposed into 17,576 clusters:
- 1 cluster of size 7.2 millions of unknowns per compute node
- 1210 subdomains per cluster of size 6859 unknowns
To sum up the problem of 124 billion of unknowns has been decomposed into 21 million of subdomains and solved in approximately 160 seconds including all required preprocessing associated with HTFETI method. The stopping criteria has been set to 10e-3.
The memory used for the largest run was 0.56 PetaByte (PB). In the Top500 there are machines, Sequoia or K computer, with total memory size close to 1.5 PB. On such machines we expect ESPRESO to be able to solve over 350 billion of unknowns
The experiments show, that scaling from 1,000 to 17,576 compute nodes, the iterative solver exhibits the parallel efficiency almost 95%. This is the critical part of the HTFETI solver, and therefore the most important observation.
We are about the test the strong scalability and also the liner elasticity soon.