The first “larger” tests of the GPU acceleration of Hybrid Total FETI method in ESPRESO has been performed on the world’s largest GPU accelerated machine. The problem that has been solved is the linear elasticity in 3D.
The method that has been tested uses GPUs to accelerate the processing of the stiffness matrices that are stored in a form of the Schur complement. More about this method can be found here or in more details in paper.
As the memory of the GPU is limited (Tesla K20X has “only” 6GB of RAM) and Schur complement is stored in general format, we are able to solve 0.3 million of unknowns per GPU only. We are currently implementing the support for symmetric Schur complements which will double the problem size solvable per one GPU. Taking this into account, we can expect to solve 100,000 unknowns per 1GB of RAM and therefore up to 1.6 million of unknowns on the newly released Nvidia Tesla P100 accelerator with 16 GB of RAM.
As these test are executed on Titan, we are comparing the AMD Opteron 6274 CPU with 16 cores and the Nvidia Tesla K20x GPU. We can observe the speedup up to 3.5 of the iterative solver when GPU is used in conjunction with the CPU.
As described here the preprocessing time to calculate the Schur complements is longer than in the case of factorization. Therefore the full advantage of the GPU acceleration can be taken only if problem needs high number (more than 400) of iteration.