Resume Lubomir Riha

Lubomír Říha
computational scientist in HPC /electrical engineer
Ostrava, Czech Republic
Research scientist at IT4Innovations. His research interest include: efficient parallelization and acceleration of various scientific applications using Multi- and Many-core architectures (such as GPUs and Intel Xeon Phi) using different parallelization techniques; multi-core and GPU acceleration of fast indexing and search in multidimensional databases; development and optimization of application level communication protocols (communication efficient work distributions on heterogeneous clusters with multiple GPU accelerators per node and InfiniBand interconnect; communication hiding and avoiding techniques for FETI solvers); and architecture optimization for specific application workloads.
Work Experience
Research Scientist, HPC Enabling Expert
2013 - present
IT4Innovations National Supercomputing Center
Acceleration of iterative sparse linear solvers using heterogeneous accelerators. Implementation of massively parallel Hybrid FETI solvers
Visiting Research Scientist
09/2014 – 12/2014
NASA Goddard Space Flight Center, Greenbelt, MD, USA
Parallelization of dimension reduction techniques for hyperspectral remote sensing and cloud detection algorithms for Intel Xeon Phi and Nvidia Kepler GPU many-core architectures.
Visiting Research Scientist
04/2014 – 07/2014
Farhat Research Group, Department of Aeronautics and Astronautics Stanford University, Stanford, California
Initial stage of the development of the ExaScale PaRallel FETI Solver (ESPRESO). It is a highly efficient parallel solver which contains several FETI based algorithms including new Hybrid Total FETI method suitable for world largest parallel machines.
Research Scientist and System administrator
01/2011 – 06/2013
High Performance Computing Laboratory (HPCL) The George Washington University, Department of Electrical and Computer Engineering, Washington, DC
Research and development in the area of high performance scientific, engineering and business analytics applications.
Research Assistant and System Administrator
01/2008 – 01/2011
Computer Science Department, Bowie State University, Bowie, MD
Design and implementation of real time image processing algorithms for: (a) computer cluster (using MPI), (b) FPGA (using VHDL, Matlab), (c) CELL (using C and CELL BE SDK) and (d) GPU (using C, C# and NVIDIA CUDA). Experience in development of Linux embedded systems for Sensor Networks. System administration of Apple G5 based HPC cluster XSEED.
2011 - present
VSB - Technical University of Ostrava, Ostrava, Czech Republic; The George Washington University, Washington, DC
VSB-TUO: Programming in the HPC Environment GWU: Introduction to High-Performance Computing; Parallel Computer Architecture
Ph.D. in Electrical Engineering
03/2006 – 03/2011
Czech Technical University in Prague, Faculty of Electrical Engineering, Department of Measurement
Dissertation thesis: Discriminating acoustic emission events on the basis of similarity. (Development of parallel signal processing algorithms for acoustic emission testing (Matlab and C# .NET); GPU parallelization of developed algorithms (Nvidia CUDA); FPGA design (VHDL - Xilinx ISE))
D.Sc. in Computer Science
01/2008 – 05/2012
Bowie State University, Department of Computer Science, Bowie, MD, USA
Dissertation thesis: GPU accelerated algorithms for multiple object tracking. (development of parallel image processing algorithms (Matlab, Nvidia CUDA, OpenMP, C and C# .NET)).
MSc. in Electrical Engineering
Czech Technical University in Prague, Faculty of Electrical Engineering, Department of Measurement
Diploma thesis: Design of a special video-processor in FPGA. (signal and image processing (Matlab), FPGA design (VHDL - Xilinx ISE) and ADSP-21xx programing - assembler for DSP (Analog Devices))
Research Interests

High Performance Computing, Accelerated computing using GPU and Intel Xeon Phi, FETI methods

Intel Parallel Computing Center at IT4Innovations
Co-Principal developer of ESPRESO library and its Xeon Phi acceleration
Main activities of the Intel® PCC at IT4I are divided into two pillars: The Development pillar of highly parallel algorithms and libraries focuses on the development of the state-of-the-art sparse linear iterative solvers combined with appropriate preconditioners and domain decomposition methods, suitable for solutions of very large problems distributed over tens of thousands of Intel® Xeon Phi™ coprocessors accelerated nodes. Developing solvers will become part of the IT4I in-house ESPRESO (ExaScale PaRallel FETI SOlver) library. The support of HPC community codes includes creating interface between ESPRESO and existing community codes Elmer and OpenFOAM Extend Project.
Co-Principal Investigator at IT4Innovations
The supercomputers energy consumption increases with approaching exascale. The main goal of the participating institutions is to develop autotuning tool which makes the computations and simulations more energy efficient employing new scenarios and techniques changing software and hardware parameters such as e.g. frequency of computational cores. The task of IT4I consists in the evaluation of dynamism in HPC applications, manual tuning especially of the FETI domain decomposition solvers, combining direct and iterative methods, and evaluation and validation of the developed tool, taking results of the manual tuning as the baseline.  
This project takes a revolutionary approach to exascale linear equations solvers and programming models by bringing together both experts for solver development and HPC software architects. The EXA2CT project focuses on three main areas, each corresponding to a specific Work Package (WP), for future exascale applications: WP1. Development of numerical algorithms for the exascale; WP2. Next-generation programming models; WP3. Proto-applications, capturing scalability problems of a real-life application. In 2014, IT4I has contributed to the WP1. Main results concern application of communication-hiding and avoiding techniques in Pipelined Conjugate Gradients (PIPECG), used in TFETI (Total Finite Element Tearing and Interconnecting) domain decomposition method and its hierarchical modification HFETI. These methods are implemented in our PERMON and ESPRESO libraries.
Other Projects
- Multi-core (OpenMP) and GPU (CUDA) acceleration of business analytics applications (multi-dimensional databases and OLAP) – with IBM Centers for Advanced Studies (CAS), Toronto, Canada   - Multi-core and GPU acceleration of indexing and search in multidimensional databases   - Intelligent Display - Architectural design of cloud based client device – with AMD Inc.   - Architecture optimization for specific application workloads   - Methods for Communication Efficient Work Distributions on Heterogeneous Clusters – designed for clusters with multiple GPU accelerators per node and InfiniBand interconnect
Software Development
2014 - present
ExaScale PaRallel FETI SOlver
ESPRESO (ExaScale PaRallel FETI SOlver) is a sparse iterative solver based on the Finite Element Tearing and Interconnect (FETI) methods. Solver uses the Hybrid FETI method based on a multi-level decomposition which significantly improves the scalability to the tens of thousands of compute nodes solving tens of billions of unknowns. ESPRESO also supports both Nvidia GPU and Intel Xeon Phi accelerators which bring significant speed up for problems requiring high number of iterations. In addition to the core solver, ESPRESO package contains a FEM/BEM library which is also under active development. This library contains an interface to ELMER1, ANSYS2 and OpenFOAM2 tools either through API1 or project files2.
Teaching and Lecturing Experience
Programming in the HPC Environment
Graduate level course (MSc. level), VSB-TU Ostrava
Introduction to High-Performance Computing
Graduate level courses (MSc. and Ph.D. level), The George Washington University
Parallel Computer Architecture
Graduate level courses (MSc. and Ph.D. level), The George Washington University
Full day tutorial on Intel Xeon Phi programming
Technical Skills
High Performance Computing
C, C++ with MPI, UPC and OpenMP (5 years) – Linux

GPU/CUDA programming
C, C++ and C# with CUDA (6 years) – MS Windows, Linux

Intel Xeon Phi programming
C, C++ for Intel Xeon Phi - Linux (3 years)

Signal and Image Processing (6 years)

FPGA development
VHDL, Xilinx ISE, ModelSim (5 years)

Embedded Linux Systems
C for embedded Linux systems (2 years)

Assembler for low-level programming
ADSP 2180 assembler (2 years), x51 assembler (2 years), x86 assembler (1 years)

HPC Linux system administration
CRAY XE6m; Linux based homogeneous and heterogeneous (GPU, Intel Xeon Phi) clusters with InfiniBand interconnect SW: CentOS, ROCKS with Sun Grid Engine, Warewulf with SLURM; CRAY XE6m (CLE with SLURM) (4 years)
In-Situ Visualization in ESPRESO