Resume Lubomir Riha

Lubomír Říha
computational scientist in HPC /electrical engineer

Research scientist at IT4Innovations. His research interest include: efficient parallelization and acceleration of various scientific applications using Multi- and Many-core architectures (such as GPUs and Intel Xeon Phi) using different parallelization techniques; multi-core and GPU acceleration of fast indexing and search in multidimensional databases; development and optimization of application level communication protocols (communication efficient work distributions on heterogeneous clusters with multiple GPU accelerators per node and InfiniBand interconnect; communication hiding and avoiding techniques for FETI solvers); and architecture optimization for specific application workloads.

Work Experience

2013 - present

Research Scientist, HPC Enabling Expert
IT4Innovations National Supercomputing Center

Acceleration of iterative sparse linear solvers using heterogeneous accelerators. Implementation of massively parallel Hybrid FETI solvers

09/2014 – 12/2014

Visiting Research Scientist
NASA Goddard Space Flight Center, Greenbelt, MD, USA

Parallelization of dimension reduction techniques for hyperspectral remote sensing and cloud detection algorithms for Intel Xeon Phi and Nvidia Kepler GPU many-core architectures.

04/2014 – 07/2014

Visiting Research Scientist
Farhat Research Group, Department of Aeronautics and Astronautics Stanford University, Stanford, California

Initial stage of the development of the ExaScale PaRallel FETI Solver (ESPRESO). It is a highly efficient parallel solver which contains several FETI based algorithms including new Hybrid Total FETI method suitable for world largest parallel machines.

01/2011 – 06/2013

Research Scientist and System administrator
High Performance Computing Laboratory (HPCL) The George Washington University, Department of Electrical and Computer Engineering, Washington, DC

Research and development in the area of high performance scientific, engineering and business analytics applications.

01/2008 – 01/2011

Research Assistant and System Administrator
Computer Science Department, Bowie State University, Bowie, MD

Design and implementation of real time image processing algorithms for: (a) computer cluster (using MPI), (b) FPGA (using VHDL, Matlab), (c) CELL (using C and CELL BE SDK) and (d) GPU (using C, C# and NVIDIA CUDA). Experience in development of Linux embedded systems for Sensor Networks. System administration of Apple G5 based HPC cluster XSEED.

2011 - present

VSB - Technical University of Ostrava, Ostrava, Czech Republic; The George Washington University, Washington, DC

VSB-TUO: Programming in the HPC Environment

GWU: Introduction to High-Performance Computing; Parallel Computer Architecture


03/2006 – 03/2011

Ph.D. in Electrical Engineering
Czech Technical University in Prague, Faculty of Electrical Engineering, Department of Measurement

Dissertation thesis: Discriminating acoustic emission events on the basis of similarity. (Development of parallel signal processing algorithms for acoustic emission testing (Matlab and C# .NET); GPU parallelization of developed algorithms (Nvidia CUDA); FPGA design (VHDL – Xilinx ISE))

01/2008 – 05/2012

D.Sc. in Computer Science
Bowie State University, Department of Computer Science, Bowie, MD, USA

Dissertation thesis: GPU accelerated algorithms for multiple object tracking. (development of parallel image processing algorithms (Matlab, Nvidia CUDA, OpenMP, C and C# .NET)).


MSc. in Electrical Engineering
Czech Technical University in Prague, Faculty of Electrical Engineering, Department of Measurement

Diploma thesis: Design of a special video-processor in FPGA. (signal and image processing (Matlab), FPGA design (VHDL – Xilinx ISE) and ADSP-21xx programing – assembler for DSP (Analog Devices))

Research Interests

High Performance Computing, Accelerated computing using GPU and Intel Xeon Phi, FETI methods


Intel Parallel Computing Center at IT4Innovations
Co-Principal developer of ESPRESO library and its Xeon Phi acceleration

Main activities of the Intel® PCC at IT4I are divided into two pillars: The Development pillar of highly parallel algorithms and libraries focuses on the development of the state-of-the-art sparse linear iterative solvers combined with appropriate preconditioners and domain decomposition methods, suitable for solutions of very large problems distributed over tens of thousands of Intel® Xeon Phi™ coprocessors accelerated nodes. Developing solvers will become part of the IT4I in-house ESPRESO (ExaScale PaRallel FETI SOlver) library. The support of HPC community codes includes creating interface between ESPRESO and existing community codes Elmer and OpenFOAM Extend Project.

Co-Principal Investigator at IT4Innovations

The supercomputers energy consumption increases with approaching exascale. The main goal of the participating institutions is to develop autotuning tool which makes the computations and simulations more energy efficient employing new scenarios and techniques changing software and hardware parameters such as e.g. frequency of computational cores. The task of IT4I consists in the evaluation of dynamism in HPC applications, manual tuning especially of the FETI domain decomposition solvers, combining direct and iterative methods, and evaluation and validation of the developed tool, taking results of the manual tuning as the baseline.


This project takes a revolutionary approach to exascale linear equations solvers and programming models by bringing together both experts for solver development and HPC software architects. The EXA2CT project focuses on three main areas, each corresponding to a specific Work Package (WP), for future exascale applications: WP1. Development of numerical algorithms for the exascale; WP2. Next-generation programming models; WP3. Proto-applications, capturing scalability problems of a real-life application. In 2014, IT4I has contributed to the WP1. Main results concern application of communication-hiding and avoiding techniques in Pipelined Conjugate Gradients (PIPECG), used in TFETI (Total Finite Element Tearing and Interconnecting) domain decomposition method and its hierarchical modification HFETI. These methods are implemented in our PERMON and ESPRESO libraries.

Other Projects

– Multi-core (OpenMP) and GPU (CUDA) acceleration of business analytics applications (multi-dimensional databases and OLAP) – with IBM Centers for Advanced Studies (CAS), Toronto, Canada


– Multi-core and GPU acceleration of indexing and search in multidimensional databases


– Intelligent Display – Architectural design of cloud based client device – with AMD Inc.


– Architecture optimization for specific application workloads


– Methods for Communication Efficient Work Distributions on Heterogeneous Clusters – designed for clusters with multiple GPU accelerators per node and InfiniBand interconnect

Software Development

2014 - present

ExaScale PaRallel FETI SOlver

ESPRESO (ExaScale PaRallel FETI SOlver) is a sparse iterative solver based on the Finite Element Tearing and Interconnect (FETI) methods. Solver uses the Hybrid FETI method based on a multi-level decomposition which significantly improves the scalability to the tens of thousands of compute nodes solving tens of billions of unknowns. ESPRESO also supports both Nvidia GPU and Intel Xeon Phi accelerators which bring significant speed up for problems requiring high number of iterations.

In addition to the core solver, ESPRESO package contains a FEM/BEM library which is also under active development. This library contains an interface to ELMER1, ANSYS2 and OpenFOAM2 tools either through API1 or project files2.

Teaching and Lecturing Experience
Programming in the HPC Environment
Graduate level course (MSc. level), VSB-TU Ostrava
Introduction to High-Performance Computing
Graduate level courses (MSc. and Ph.D. level), The George Washington University
Parallel Computer Architecture
Graduate level courses (MSc. and Ph.D. level), The George Washington University

Full day tutorial on Intel Xeon Phi programming