Espreso CPU

The original implementation of the ESPRESO solver was developed during IT4Innovations participation at the EXA2CT project. The task was to develop a highly scalable solver based on Hybrid FETI method. This algorithm has a potential to scale to today’s largest Peta-scale machines, such as the European largest machine Piz Daint in CSCS, Switzerland or the world second largest machine Titan installed in Oak Ridge National Lab, USA. The goal is to develop a code that takes the potential of the Hybrid FETI and is able to run efficiently on such machines.

The main focus of the CPU version is development of an MPI based communication layer designed particularly for FETI that enables the scalability of the solver. It is not just an MPI code but, as many modern parallel applications, it uses the hybrid parallelization. The three levels of parallelization are message passing, threading using Cilk++ and vectorization using Intel MKL and Cilk++.

All other version of ESPRESO (GPU and MIC) which targets various many-core accelerators are developed on the top of CPU version.

Evaluation of the strong scalability (solving problem of the same size using more and more resources) on the European largest machine Piz Daint installed in Swiss National Supercomputing Centre (CSCS) in Lugano.

Evaluation of the strong scalability (solving problem of the same size using more and more resources) on the European largest machine Piz Daint installed in Swiss National Supercomputing Centre (CSCS) in Lugano.

CPU version performs all the processing on the general purpose processors. It works with sparse system matrices assembled from FEM (Finite Elements Method) and relies on sparse direct solvers (such as PARDISO, MUMPS, PASTIX, … ) to process them (factorization + forward and backward substitution). Solver also supports BEM (Boundary Element Method) discretization which, which is based on a great work of our coleagues from BEM4I project at IT4Innovations (http://industry.it4i.cz/produkty/bem4i/).

The main advantage of Hybrid FETI is significant reduction of the preprocessing time for large number of subdomains.

The main advantage of Hybrid FETI is significant reduction of the preprocessing time for large number of subdomains. In the figure every compute node processes several hundreds of subdomains.

Implementation details

ESPRESO CPU is implemented in C++ and it is compiler and MPI implementation independent, even though the Intel compiler suite is a preferred choice. It only depends on the Intel MKL (Math Kernel Library) library out of which is uses the Sparse and Dense BLAS routines and  PARDISO sparse direct solver.

Other sparse direct solver can be used instead of the one from MKL. As of now ESPRESO CPU supports:

  • PARDISO from Intel Math Kernel Library
  • original PARDISO developed at Universita della Svizzera italina – http://www.pardiso-project.org
  • MUMPS (MUltifrontal Massively Parallel sparse direct Solver) – http://mumps.enseeiht.fr
  • support for more solver is under active development (PASTIX, cuSOLVER, … )

Required tools:

  • Intel compiler with Intel MKL library – sparse BLAS operation and PARDISO direct solver
  • any MPI library (successfully tested on CRAY, BULLx MPI, Intel MPI, SGI MPT) – support for MPI standard 3.0 is a plus but it is not required
  • memory utilization/requirements are fully customizable and can be tuned for any particular machine configuration