08 May

ESPRESO solved 124 billion of unknowns on 17,576 compute nodes of the Titan Supercomputer

In April the work related to scalability of the solver has reached the state, when we decided to run the first full scale test of the ESPRESO on the ORNL Titan supercomputer. In this test we have used only the CPU part of the machine and did not use the GPU accelerators. The physical problem we were solving was heat transfer or in other words the Laplace equation in 3D.



The method that allowed us to run such a large tests is the Hybrid Total FETI method that has been implemented into ESPRESO library under the umbrella of the EX2ACT (EXascale Algorithms and Advanced Computational Techniques) European project. For more information see the project website: www.exa2ct.eu.

The test has been performed using the 3D cube benchmark generator, which is fully parallel and is able to generate all matrices required by the Hybrid TFETI solver in several seconds. The problem has been decomposed into 17,576 clusters:

  • 1 cluster of size 7.2 millions of unknowns per compute node
  • 1210 subdomains per cluster of size 6859 unknowns

To sum up the problem of 124 billion of unknowns has been decomposed into 21 million of subdomains and solved in approximately 160 seconds including all required preprocessing associated with HTFETI method. The stopping criteria has been set to 10e-3.

The memory used for the largest run was 0.56 PetaByte (PB).  In the Top500 there are machines, Sequoia or K computer, with total memory size close to 1.5 PB. On such machines we expect ESPRESO to be able to solve over 350 billion of unknowns

The experiments show, that scaling from 1,000 to 17,576 compute nodes, the iterative solver exhibits the parallel efficiency almost 95%. This is the critical part of the HTFETI solver, and therefore the most important observation.

We are about the test the strong scalability and also the liner elasticity soon.

26 Nov

ESPRESO to be tested on Titan, the second largest supercomputer in the world

The ESPRESO developer team gained access to the Titan machine through Director Discretion project. The project was awarded with 2,700,000 core-hours. This means that would be able to use the entire machine for up to 5 hours. This of course will not be the case, as thousands of smaller tests will lead to a version that will be able to efficiently use entire supercomputer.

The main objectives of this project are as follows: (1) performance evaluation of the ESPRESO H on all 18,000 compute nodes and identification of the bottlenecks at this scale using parallel problem generator, (2) optimization of the communication layer and all global operations, (3)   development and optimization of the GPU accelerated version at large scale and (4) performance benchmarking using real-life problems of both CPU and GPU versions.

More information about Oak Ridge National Lab and the Titan supercomputer can be found here: ORNLTitan.

26 Nov

Power efficient version of ESPRESO to be developed under READEX Horizon 2020 project


The goal of the READEX project is to improved energy-efficiency of applications in the field of High-Performance Computing. The project brings together European experts from different ends of the computing spectrum to develop a tools-aided methodology for dynamic auto-tuning, allowing users to automatically exploit the dynamic behaviour of their applications by adjusting the system to the actual resource requirements.

The task of IT4I consists in the evaluation of dynamism in HPC applications, manual tuning especially of the FETI domain decomposition solvers, combining direct and iterative methods, and evaluation and validation of the developed tool, taking results of the manual tuning as the baseline.

More information can be found at http://www.readex.eu.

26 Nov

ESPRESO will be accelerated by Intel Xeon Phi coprocessors


The Intel® PCC at IT4Innovations National Supercomputing Center (Intel® PCC – IT4I) is developing highly parallel algorithms and libraries optimized for latest Intel parallel technologies. Main activities of the center are divided into two pillars: Development of highly parallel algorithms and libraries, and Development and support of HPC community codes. The pillar Development of highly parallel algorithms and libraries focuses on the development of the state-of-the-art sparse iterative linear solvers combined with appropriate preconditioners and domain decomposition methods, suitable for solution of very large problems distributed over tens of thousands of Xeon Phi accelerated nodes. Developed solvers will become part of IT4I in-house ESPRESO (ExaScale PaRallel FETI SOlver) library. Development and support of HPC community codes includes creating interface between ESPRESO and existing community codes Elmer and OpenFOAM Extend Project.

More details about the centre can be found at http://ipcc.it4i.cz.