Research scientist at IT4Innovations. His research interest include: efficient parallelization and acceleration of various scientific applications using Multi- and Many-core architectures (such as GPUs and Intel Xeon Phi) using different parallelization techniques; multi-core and GPU acceleration of fast indexing and search in multidimensional databases; development and optimization of application level communication protocols (communication efficient work distributions on heterogeneous clusters with multiple GPU accelerators per node and InfiniBand interconnect; communication hiding and avoiding techniques for FETI solvers); and architecture optimization for specific application workloads.
2013 - present
Acceleration of iterative sparse linear solvers using heterogeneous accelerators. Implementation of massively parallel Hybrid FETI solvers
09/2014 – 12/2014
Parallelization of dimension reduction techniques for hyperspectral remote sensing and cloud detection algorithms for Intel Xeon Phi and Nvidia Kepler GPU many-core architectures.
04/2014 – 07/2014
Initial stage of the development of the ExaScale PaRallel FETI Solver (ESPRESO). It is a highly efficient parallel solver which contains several FETI based algorithms including new Hybrid Total FETI method suitable for world largest parallel machines.
01/2011 – 06/2013
Research and development in the area of high performance scientific, engineering and business analytics applications.
01/2008 – 01/2011
Design and implementation of real time image processing algorithms for: (a) computer cluster (using MPI), (b) FPGA (using VHDL, Matlab), (c) CELL (using C and CELL BE SDK) and (d) GPU (using C, C# and NVIDIA CUDA). Experience in development of Linux embedded systems for Sensor Networks. System administration of Apple G5 based HPC cluster XSEED.
2011 - present
VSB-TUO: Programming in the HPC Environment
GWU: Introduction to High-Performance Computing; Parallel Computer Architecture
03/2006 – 03/2011
Dissertation thesis: Discriminating acoustic emission events on the basis of similarity. (Development of parallel signal processing algorithms for acoustic emission testing (Matlab and C# .NET); GPU parallelization of developed algorithms (Nvidia CUDA); FPGA design (VHDL – Xilinx ISE))
01/2008 – 05/2012
Dissertation thesis: GPU accelerated algorithms for multiple object tracking. (development of parallel image processing algorithms (Matlab, Nvidia CUDA, OpenMP, C and C# .NET)).
Diploma thesis: Design of a special video-processor in FPGA. (signal and image processing (Matlab), FPGA design (VHDL – Xilinx ISE) and ADSP-21xx programing – assembler for DSP (Analog Devices))
High Performance Computing, Accelerated computing using GPU and Intel Xeon Phi, FETI methods
Main activities of the Intel® PCC at IT4I are divided into two pillars: The Development pillar of highly parallel algorithms and libraries focuses on the development of the state-of-the-art sparse linear iterative solvers combined with appropriate preconditioners and domain decomposition methods, suitable for solutions of very large problems distributed over tens of thousands of Intel® Xeon Phi™ coprocessors accelerated nodes. Developing solvers will become part of the IT4I in-house ESPRESO (ExaScale PaRallel FETI SOlver) library. The support of HPC community codes includes creating interface between ESPRESO and existing community codes Elmer and OpenFOAM Extend Project.
The supercomputers energy consumption increases with approaching exascale. The main goal of the participating institutions is to develop autotuning tool which makes the computations and simulations more energy efficient employing new scenarios and techniques changing software and hardware parameters such as e.g. frequency of computational cores. The task of IT4I consists in the evaluation of dynamism in HPC applications, manual tuning especially of the FETI domain decomposition solvers, combining direct and iterative methods, and evaluation and validation of the developed tool, taking results of the manual tuning as the baseline.
This project takes a revolutionary approach to exascale linear equations solvers and programming models by bringing together both experts for solver development and HPC software architects. The EXA2CT project focuses on three main areas, each corresponding to a specific Work Package (WP), for future exascale applications: WP1. Development of numerical algorithms for the exascale; WP2. Next-generation programming models; WP3. Proto-applications, capturing scalability problems of a real-life application. In 2014, IT4I has contributed to the WP1. Main results concern application of communication-hiding and avoiding techniques in Pipelined Conjugate Gradients (PIPECG), used in TFETI (Total Finite Element Tearing and Interconnecting) domain decomposition method and its hierarchical modification HFETI. These methods are implemented in our PERMON and ESPRESO libraries.
– Multi-core (OpenMP) and GPU (CUDA) acceleration of business analytics applications (multi-dimensional databases and OLAP) – with IBM Centers for Advanced Studies (CAS), Toronto, Canada
– Multi-core and GPU acceleration of indexing and search in multidimensional databases
– Intelligent Display – Architectural design of cloud based client device – with AMD Inc.
– Architecture optimization for specific application workloads
– Methods for Communication Efficient Work Distributions on Heterogeneous Clusters – designed for clusters with multiple GPU accelerators per node and InfiniBand interconnect
2014 - present
ESPRESO (ExaScale PaRallel FETI SOlver) is a sparse iterative solver based on the Finite Element Tearing and Interconnect (FETI) methods. Solver uses the Hybrid FETI method based on a multi-level decomposition which significantly improves the scalability to the tens of thousands of compute nodes solving tens of billions of unknowns. ESPRESO also supports both Nvidia GPU and Intel Xeon Phi accelerators which bring significant speed up for problems requiring high number of iterations.
In addition to the core solver, ESPRESO package contains a FEM/BEM library which is also under active development. This library contains an interface to ELMER1, ANSYS2 and OpenFOAM2 tools either through API1 or project files2.