# Domain Decomposition¶

$\begin{split}\begin{array}{rl} \text{CP} & \text{Coarse Problem }\\ \text{DD} & \text{Domain Decomposition }\\ \text{DOF} & \text{Degrees of Freedom }\\ \text{FETI} &\text{ Finite Tearing and Interconnecting DD method }\\ \text{TFETI} &\text{ Total FETI method }\\ \text{HTFETI} &\text{ Hybrid Total FETI method}\\ \text{SpMV} & \text{Sparse Matrix Vector Multiplication }\\ \text{RHS} &\text{ Right-Hand Side }\\ \text{RBM} &\text{ Rigid Body Motions }\\ \text{SpDS} &\text{ Sparse Direct Solver }\\ \mathbf{u}_i &\text{ subdomain nodal displacements }\\ \boldsymbol{\lambda} &\text{ Lagrange multipliers }\\ \mathbf{K}_{i} & \text{non-regularized subdomain stiffness matrix }\\ \hat{\mathbf{K}}_{i} & \text{regularized subdomain stiffness matrix }\\ \mathbf{T}_{i} &\text{ subdomain regularization matrix }\\ \mathbf{M}_{i} & \text{subdomain preconditioner matrix }\\ \mathbf{R}_{i} & \text{subdomain kernel matrix }\\ \mathbf{B}_{1,i} & \text{subdomain gluing matrix - constraint matrix }\\ \mathbf{B}_{0,i} & \text{subdomain corner matrix }\\ \mathbf{F}_0 & \text{cluster FETI operator }\\ \mathbf{G}_0 & \text{cluster matrix }\\ \mathbf{S}_0 & \text{cluster coarse problem matrix }\\ \mathbf{G}^\top \mathbf{G} & \text{coarse problem matrix of the TFETI method }\\ \end{array}\end{split}$

For more than two decades, the Finite Element Tearing and Interconnecting method (FETI, for more detail see, e.g., in [5], [7]) has been successfully used in the engineering community, primarily for very large problems arising from the discretization of partial differential equations. In such an approach the original structure is artificially decomposed into several non-overlapping subdomains. Mutual continuity of primal variables between neighboring subdomains is enforced afterwards by dual variables, i.e., Lagrange multipliers (LM). They are usually obtained iteratively by some of the Krylov subspace methods, then the primal solution is evaluated locally on each subdomain. The FETI method is also applicable for contact problems with inequality constrains (see, e.g., [3], [4])

In 2006 Dostál et al. [2] introduced a new variant of an algorithm called Total FETI (or TFETI) in which Dirichlet boundary condition is enforced also by LM. Please note, there is no difference, if the hybrid variant stems from FETI or TFETI. The impact is identical in both cases. This paper addresses the modification of the TFETI variant, and is therefore called HTFETI.

The HTFETI method is a variant of hybrid FETI methods introduced by Klawonn and Rheinbach (see, e.g., [8], [9]) for FETI and FETI-DP. In the original approach, the authors gather a number of subdomains in clusters which provides a three-level domain decomposition approach. Each cluster consists of a certain number of subdomains and for these, a FETI-DP system is set up. The clusters are then solved by a traditional FETI approach using projections to treat the non trivial kernels. In contrast, in HTFETI, a TFETI approach is used for the subdomains in each cluster and the FETI approach using projections for clusters.

The main advantage of HTFETI is the ability to solve problems decomposed into a very large number of subdomains.

Fig. 1 The two approaches how to use HTFETI method and their comparison with TFETI for problems of size up to 3 billion DOF.

To solve extremely large problems, the configuration of the HTFETI has to be different (Approach A: 1 cluster per compute node with large number of subdomains - up to several thousand) than for solving moderate problems (Approach B: 1 cluster per CPU core with small number of subdomains - up to a few hundred). In particular, Approach B can use subdomains of approximately 1,000 DOF to solve a 1 billion DOF problem on approximately 200 compute nodes with 64 GB of RAM. Solving the same problem with TFETI requires subdomains of approximately 90,000 DOF.

The comparison of HTFETI and TFETI methods is shown in Fig. 1. Using TFETI with large number of DOF per subdomain is an example of how to solve large problems using this technique. The figure shows that its scalability is not ideal, but is acceptable up to 3 billion DOF. However, the solution time in this range is still shorter than it would be with the use of the HTFETI technique. The figure clearly shows that for problems greater than 3 billion DOF, the HTFETI Approach A is better.

However, the HTFETI can be configured to the other extreme in which small number of subdomains per cluster (here 216) and very small number of DOF per subdomain (here 1536 DOF) are used. This approach is significantly faster than TFETI with large number of DOF per subdomain. Figure also shows that TFETI can be configured with smaller number of DOF per subdomain (here 20,577) to reduce processing time. However, this option is suitable only for very small problems as its scalability is unsatisfactory.

In sum, the figure shows that HTFETI provides better solution time for “smaller problems” up to 3 billion DOF and better scalability for large problems.

In the next section, the theory behind the HTFETI is described, followed by the description of the parallel implementation in the ESPRESO library.

# Hybrid Total FETI Method¶

The FETI method, widely known for more than two decades, is an efficient tool for solving large-scale problems in structural mechanics via Krylov subspace methods adapted to parallel machines. Although this paper is focused on the HTFETI, this section will first introduce the original FETI method on a simple cantilever beam

Fig. 2 Cantilever beam, FEM discretization

followed by an introduction of its hybrid variant.

## FETI¶

In the simple engineering problem depicted in Fig. 2, the aim is to get deformation of the structure. A cantilever beam, fixed on the left side and loaded by the pressure $$p_y$$ at the top, is discretizated with FEM [11]. The number of nodes is $$n_n^0$$, and number of elements is $$n_e$$. Above the mesh global stiffness matrix $${\mathbf{K}}_g$$ and global RHS $${\mathbf{f}}_g$$ (nodal forces) are assembled, whereas both objects are included into nodal equilibrium equation

(1)${\mathbf{K}}_g{\mathbf{u}}_g={\mathbf{f}}_g.$

To get the vector $${\mathbf{u}}_g$$ (nodal displacements), Dirichlet boundary condition has to be taken into account due to a singular matrix $${\mathbf{K}}_g$$ since it has a non-empty null-space. Then the linear system can be solved by an iterative or direct solver.

Clearly, this undecomposed approach has its own limitations. The size of $${\mathbf{K}}_g$$ can overload the computer memory which is one of the reasons one might employ domain decomposition methods. In the FETI case, the mesh is decomposed into four smaller submeshes (see Fig. 3-a)

Fig. 3 Domain decomposition

to avoid assembling the global objects $$\mathbf{K}_g$$, $$\mathbf{f}_g$$. Decomposition generally causes an increase in the number of nodes in cuts between subdomains, thus the global size of the unknowns of the decomposed problem is always bigger than the original one. In fact, it is not a problem, because any object in a FETI algorithm stored at one computational node does not have such dimension. Equilibrium equation for $$i$$-th subdomain is

(2)$\mathbf{K}_i \mathbf{u}_i = \mathbf{f}_i-\mathbf{B}^\top_i \boldsymbol{\lambda},~ i\in (1,2,3,4),$

where $$\mathbf{K}_i$$ and $$\mathbf{f}_i$$ with the same meaning as before are assembled for all subdomains separately. On the RHS beside the vector $$\mathbf{f}_i$$ the product $$\mathbf{B}^\top_i \boldsymbol{\lambda}$$ appears. It is actually an additive nodal force vector which acts on the interface between subdomains and it substitutes influence transmitting from neighbors. Those four systems in (2) are supplemented by the global constraint condition

(3)$\sum_{i=1}^{4} \mathbf{B}_i\mathbf{u}_i=\mathbf{o}.$

Equilibrium equations (2) together with condition (3) can be written globally as

(4)$\begin{split} \left( \begin{array}{lllll} \mathbf{K}_1 & \mathbf{O} & \mathbf{O} & \mathbf{O} & \mathbf{B}_{1}^\top \\ \mathbf{O} & \mathbf{K}_2 & \mathbf{O} & \mathbf{O} & \mathbf{B}_{2}^\top \\ \mathbf{O} & \mathbf{O} & \mathbf{K}_3 & \mathbf{O} & \mathbf{B}_{3}^\top \\ \mathbf{O} & \mathbf{O} & \mathbf{O} & \mathbf{K}_4 & \mathbf{B}_{4}^\top\\ \mathbf{B}_{1}& \mathbf{B}_{2}& \mathbf{B}_{3} & \mathbf{B}_{4}& \mathbf{O} \end{array} \right) \left(\begin{array}{c} \mathbf{u}_1 \\ \mathbf{u}_2 \\ \mathbf{u}_3 \\ \mathbf{u}_4 \\ \boldsymbol{\lambda} \end{array}\right)= \left(\begin{array}{c} \mathbf{f}_1 \\ \mathbf{f}_2 \\ \mathbf{f}_3 \\ \mathbf{f}_4 \\ \mathbf{c} \end{array}\right)\end{split}$

or shortly

(5)$\begin{split} \begin{array}{l} \mathbf{K}\mathbf{u} + \mathbf{B}^\top\boldsymbol{\lambda} = \mathbf{f}, \\ \mathbf{B}\mathbf{u} = \mathbf{c}. \end{array}\end{split}$

The meaning of symbols in (4) is obvious to detailed expression in (5). Generally, vector $$\mathbf{c}$$ contains zero entries. Vector of LM

$\boldsymbol{\lambda} = \left( (\boldsymbol{\lambda}^{d,1})^\top,~ (\boldsymbol{\lambda}^{g,12})^\top,~ (\boldsymbol{\lambda}^{g,23})^\top,~ (\boldsymbol{\lambda}^{g,34})^\top \right)^\top$

consists of four parts where the first subvector $$\boldsymbol{\lambda}^{d,1}$$ enforces Dirichlet boundary condition, the second one $$\boldsymbol{\lambda}^{g,12}$$ enforces connectivity between subdomains 1 and 2, etc. The division of global $$\boldsymbol{\lambda}$$ into four parts does not relate to the number of subdomains. These four sets correspond to four cuts caused by the decomposition. The splitting into four subvectors shows how $$\boldsymbol{\lambda}$$ can be stored according to numeric superindexes. For instance, the first part $$\boldsymbol{\lambda}^{d,1}$$ is related to the first subdomain only, and it will never touch another one, therefore it is stored only on the first subdomain. The second part $$\boldsymbol{\lambda}^{g,12}$$ connects subdomain 1 and 2, thus they are stored only there. Evidently, there is no need to assemble the global $$\boldsymbol{\lambda}$$ vector on a single node.

In the next step, the primal variables will be eliminated. From Eq. (5) the vector of unknown displacements is

(6)$\mathbf{u} = \mathbf{K}^{+} (\mathbf{f}-\mathbf{B}^\top \boldsymbol{\lambda}) + \mathbf{R} \boldsymbol{\alpha},$

where $$\mathbf{R}=\text{diag}\left(\mathbf{R}_1,~\mathbf{R}_2,~\mathbf{R}_3,~\mathbf{R}_4\right)$$ is block-diagonal matrix, the columns of which define the basis of null space of $$\mathbf{K}$$, and the vector $$\boldsymbol{\alpha}$$ contains its amplitudes. Mutual relation between them is

(7)$\mathbf{K}\mathbf{R}=\mathbf{O}~\text{or}~\mathbf{R}^\top\mathbf{K}=\mathbf{O}.$

The term (6) with the second condition in (5) eliminates the primal variables

(8)$\mathbf{B}\mathbf{K}^{+} \mathbf{f}-\mathbf{B}\mathbf{K}^{+} \mathbf{B}^\top \boldsymbol{\lambda} + \mathbf{B}\mathbf{R} \boldsymbol{\alpha}=\mathbf{c}$

similarly, as a combination of the first equation from (5) and Eq (7).

(9)$\mathbf{R}^\top\mathbf{Ku}+\mathbf{R}^\top\mathbf{B}^\top\boldsymbol{\lambda} = \mathbf{R}^\top\mathbf{f}.$

The last two terms together can be expressed as

(10)$\begin{split} \begin{array}{l} \mathbf{F}\boldsymbol{\lambda} + \mathbf{G}\boldsymbol{\alpha} = \mathbf{d}, \\ \mathbf{G}^\top\boldsymbol{\lambda} = \mathbf{e} \end{array}\end{split}$

in which the following substitutions are used

(11)$\begin{split} \begin{array}{ll} \mathbf{F}=\mathbf{BK}^+\mathbf{B}^\top-\mathbf{c}, & \mathbf{G}=-\mathbf{BR}, \\ \mathbf{d}=\mathbf{BK}^+\mathbf{f}, & \mathbf{e}=-\mathbf{R}^\top \mathbf{f}. \end{array}\end{split}$

In the previous steps, the primal variables $$\mathbf{u}$$ were eliminated. The newly derived system of linear equations (10) can be favorably solved by iterative methods, e.g., the conjugate gradient method (CGM).

$\begin{split}\begin{array}{rl} 1. & \text{set:} ~ \varepsilon>0,~i_{max}>0, ~\boldsymbol{\lambda}_0,\\ 2. & \overline{\mathbf{g}_0} = \mathbf{g}_0 +\mathbf{G}\boldsymbol{\alpha}_0 ~~ \text{where} ~~ \mathbf{g}_0 = \mathbf{F}\boldsymbol{\lambda}_0 -\mathbf{d}\\ 3. & \mathbf{w}_0 = \overline{\mathbf{g}_0} \\ 4. & \mathbf{do}~i=0,1,...,i_{max} \\ 5. & \hspace{5mm} \rho_i = -(\overline{\mathbf{g}_{i}},\overline{\mathbf{g}_{i}})/ (\mathbf{w}_i,\mathbf{Fw}_i) \\ 6. & \hspace{5mm}\boldsymbol{\lambda}_{i+1} = \boldsymbol{\lambda}_{i+1} + \mathbf{w}_i \rho_i \\ 7. & \hspace{5mm}\overline{\mathbf{g}_{i+1}}=\mathbf{g}_{i+1}+\mathbf{G}\boldsymbol{\alpha}_i,~~\mathbf{g}_{i+1} = \mathbf{g}_{i} + \mathbf{Fw}_i \rho_i\\ 8. & \hspace{5mm}\mathbf{if}~\|\overline{\mathbf{g}_{i+1}}\|<\varepsilon \\ 9. & \hspace{10mm} \mathbf{break} \\ 10. & \hspace{5mm} \mathbf{end if} \\ 11.&\hspace{5mm} \boldsymbol{\gamma} = (\overline{\mathbf{g}_{i+1}},\overline{\mathbf{g}_{i+1}}) / (\overline{\mathbf{g}}_{i},\overline{\mathbf{g}_{i}})\\ 11. & \hspace{5mm} \mathbf{w}_{i+1} = \overline{\mathbf{g}_{i+1}} + \mathbf{w}_i \gamma_i\\ 12. & \mathbf{end}~\mathbf{do} \\ & \\ & \hspace{15mm}\mathbf{Algorithm}~1. \end{array}\end{split}$

An approximation of LM used in Algorithm $$1$$ in $$i$$-th iteration is considered in the form

(12)$\boldsymbol{\lambda}_i = \boldsymbol{\lambda}_o + \sum_{j=0}^{i}\mathbf{w}_j \rho_j.$

If an initial guess is chosen as a linear combination of basis vectors of matrix $$\mathbf{G}$$ in the form of $$\boldsymbol{\lambda}_0=\mathbf{G} \mathbf{y}$$, the vector $$\mathbf{y}$$ can be simply determined using the second equation in (10) as

$\boldsymbol{\lambda}_0 = (\mathbf{G}^\top\mathbf{G})^{-1}\mathbf{e}.$

Such initial guess fully accomplishes the second equality in (10), therefore the rest of the approximation in (12) lies in the kernel of $$\mathbf{G}$$.

During the iterative process, just the partial gradient $$\mathbf{g}_i$$ is kept in the memory, and before using the complete form $$\overline{\mathbf{g}_{i+1}}$$ it is modified by $$\boldsymbol{\alpha}_i$$ to satisfy condition $$\mathbf{G}^\top\overline{\mathbf{g}_{i+1}}=\mathbf{o}$$. Firstly, it appears if the initial conjugate vector $$\mathbf{w}_0$$ is used as a contribution to the $$\boldsymbol{\lambda}_0$$. To keep $$\mathbf{w}_0$$ in the kernel of $$\mathbf{G}$$, vector

$\boldsymbol{\alpha}_0 = -(\mathbf{G}^\top\mathbf{G})^{-1} \mathbf{G}^\top(\mathbf{F}\boldsymbol{\lambda}_0 -\mathbf{d})$

has to be evaluated to satisfy the condition

$\mathbf{G}^\top\mathbf{w}_0 = \mathbf{G}^\top\overline{\mathbf{g}_0} = \mathbf{G}^\top(\mathbf{F}\boldsymbol{\lambda}_0 -\mathbf{d}) + \mathbf{G}^\top\mathbf{G} \boldsymbol{\alpha}_0=\mathbf{o}.$

The fulfillment of the condition above is equivalent to the projection of the part of the gradient

$\mathbf{w}_0=\overline{\mathbf{g}}_0= \mathbf{P} (\mathbf{F}\boldsymbol{\lambda}_0 -\mathbf{d})= \mathbf{P}\mathbf{g}_0$

by the orthogonal projector

$\mathbf{P} = \mathbf{I}-\mathbf{G}(\mathbf{G}^\top\mathbf{G})^{-1} \mathbf{G}^\top.$

Due to the projection applied to each iteration, Algorithm 1 is called Projected Conjugate Gradient Method (PCGM) (for more details see [6]). The iterative process can be accelerated by Lumped or Dirichlet preconditioner, in this case the preconditioned gradient has to be projected again (for more detail see, e.g., [10]).

## From FETI to HTFETI¶

FETI algorithm described in subsection FETI is efficient (besides other effects related to the condition of the system etc.) until CP

$\mathbf{G}^\top\mathbf{G}\in\mathbb{R}^{n_{cp}\times n_{cp}}$

can be effectively factorized from the time point of view. Its dimension $$n_{cp}$$ in linear elasticity depends on the number of subdomains $$n_{sub}$$ (in 2D: $$n_{cp}=3\cdot n_{sub}$$, in 3D: $$n_{cp}=6\cdot n_{sub}$$), and it is equal to all RBM in a decomposed structure. In Fig. 3, as illustrated using the FETI technique, each subdomain of the decomposed beam has three RBM (in 2D), therefore $$n_{cp}=3 \cdot 4 = 12$$. CP dimension reduction by HTFETI can be demonstrated even in this small and simple benchmark. The permutation and splitting of matrix $$\mathbf{B}$$ according to Fig. 3-c reads

$\begin{split} \left( \begin{array}{llll|ll|l} \mathbf{K}_1 & \mathbf{O} & \mathbf{O} & \mathbf{O} & \mathbf{B}_{0,1}^\top & \mathbf{O} & \mathbf{B}_{1,1}^\top \\ \mathbf{O} & \mathbf{K}_2 & \mathbf{O} & \mathbf{O} & \mathbf{B}_{0,2}^\top & \mathbf{O} & \mathbf{B}_{1,2}^\top \\ \mathbf{O} & \mathbf{O} & \mathbf{K}_3 & \mathbf{O} & \mathbf{O} & \mathbf{B}_{0,3}^\top & \mathbf{B}_{1,3}^\top \\ \mathbf{O} & \mathbf{O} & \mathbf{O} & \mathbf{K}_4 & \mathbf{O} & \mathbf{B}_{0,4}^\top & \mathbf{B}_{1,4}^\top\\ \hline \mathbf{B}_{0,1} & \mathbf{B}_{0,2}& \mathbf{O}& \mathbf{O}& \mathbf{O}& \mathbf{O}& \mathbf{O}\\ \mathbf{O}& \mathbf{O}& \mathbf{B}_{0,3} & \mathbf{B}_{0,4}& \mathbf{O}& \mathbf{O}& \mathbf{O}\\ \hline \mathbf{B}_{1,1}& \mathbf{B}_{1,2}& \mathbf{B}_{1,3} & \mathbf{B}_{1,4}& \mathbf{O}& \mathbf{O}& \mathbf{O} \end{array} \right) \left(\begin{array}{c} \mathbf{u}_1 \\ \mathbf{u}_2 \\ \mathbf{u}_3 \\ \mathbf{u}_4 \\ \hline \boldsymbol{\lambda}_{0,1} \\ \boldsymbol{\lambda}_{0,2} \\\hline \boldsymbol{\lambda}_{1} \end{array}\right)= \left(\begin{array}{c} \mathbf{f}_1 \\ \mathbf{f}_2 \\ \mathbf{f}_3 \\ \mathbf{f}_4 \\ \hline \mathbf{c}_{0,1} \\ \mathbf{c}_{0,2} \\\hline \mathbf{c}_{1} \end{array}\right)\end{split}$

where matrix

$\begin{split}\mathbf{B}_0= \left(\begin{array}{llll} \mathbf{B}_{0,1} & \mathbf{B}_{0,2}& \mathbf{O}& \mathbf{O} \\ \mathbf{O}& \mathbf{O} & \mathbf{B}_{0,3} & \mathbf{B}_{0,4} \end{array}\right)\end{split}$

consists of constraints gluing subdomains into 2 clusters, and

$\mathbf{B}_1= \left(\begin{array}{llll} \mathbf{B}_{1,1} & \mathbf{B}_{1,2}& \mathbf{B}_{1,3} & \mathbf{B}_{1,4} \end{array}\right)$

contains the rest of the equality constraints. This permuted system of the linear equation still has the structure for a FETI algorithm as in (4). The next equation shows the system after the permutation

(13)$\begin{split} \left( \begin{array}{lll|lll|l} \mathbf{K}_1 & \mathbf{O} & \mathbf{B}_{0,1}^\top & \mathbf{O} & \mathbf{O} & \mathbf{O} & \mathbf{B}_{1,1}^\top \\ \mathbf{O} & \mathbf{K}_2 & \mathbf{B}_{0,2}^\top & \mathbf{O} & \mathbf{O} & \mathbf{O} & \mathbf{B}_{1,2}^\top \\ \mathbf{B}_{0,1} & \mathbf{B}_{0,2}& \mathbf{O}& \mathbf{O}& \mathbf{O}& \mathbf{O}& \mathbf{O}\\ \hline \mathbf{O} & \mathbf{O} & \mathbf{O} & \mathbf{K}_3 & \mathbf{O} & \mathbf{B}_{0,3}^\top & \mathbf{B}_{1,3}^\top \\ \mathbf{O} & \mathbf{O} & \mathbf{O} & \mathbf{O} & \mathbf{K}_4 & \mathbf{B}_{0,4}^\top & \mathbf{B}_{1,4}^\top\\ \mathbf{O}& \mathbf{O}& \mathbf{O}& \mathbf{B}_{0,3} & \mathbf{B}_{0,4}& \mathbf{O}& \mathbf{O}\\ \hline \mathbf{B}_{1,1}& \mathbf{B}_{1,2}& \mathbf{O}& \mathbf{B}_{1,3}& \mathbf{B}_{1,4}& \mathbf{O}& \mathbf{O} \end{array} \right) \left(\begin{array}{c} \mathbf{u}_1 \\ \mathbf{u}_2 \\ \boldsymbol{\lambda}_{0,1} \\ \hline \mathbf{u}_3 \\ \mathbf{u}_4 \\ \boldsymbol{\lambda}_{0,2} \\ \hline \boldsymbol{\lambda}_{1} \end{array}\right) = \left(\begin{array}{c} \mathbf{f}_1 \\ \mathbf{f}_2 \\ \mathbf{c}_{0,1} \\ \hline \mathbf{f}_3 \\ \mathbf{f}_4 \\ \mathbf{c}_{0,2} \\ \hline \mathbf{c}_{1} \end{array}\right).\end{split}$

This system can be rewritten in a simplified form

(14)$\begin{split} \left( \begin{array}{l|l|l} \tilde{\mathbf{K}}_{1} & \mathbf{O} & \tilde{\mathbf{B}}^\top_{1} \\ \hline \mathbf{O} & \tilde{\mathbf{K}}_{2} & \tilde{\mathbf{B}}^\top_{2} \\ \hline \tilde{\mathbf{B}}_{1} & \tilde{\mathbf{B}}_{2} & \mathbf{O} \end{array} \right) \left(\begin{array}{l} \tilde{\mathbf{u}}_1 \\ \hline \tilde{\mathbf{u}}_2 \\ \hline \tilde{\boldsymbol{\lambda}} \end{array} \right) = \left(\begin{array}{l} \tilde{\mathbf{f}}_1 \\ \hline \tilde{\mathbf{f}}_2 \\ \hline \tilde{\mathbf{c}} \end{array} \right)\end{split}$

which can be solved by HTFETI method. The classical FETI method belongs to the group of dual Schur complement methods in which the primal variables $$\mathbf{u}$$ are eliminated and satisfied exactly in each iteration. Dual variables $$\boldsymbol{\lambda}$$, reaction forces between subdomains, are obtained by an iterative solver. In the case of HTFETI, not only the primal variables but also a subset of dual variables $$\boldsymbol{\lambda}_0$$ are eliminated. This modification causes subdomains 1 and 2 to be partially glued onto interface into cluster 1, and subdomains 3 and 4 into cluster 2. After that, the structure behaves like a problem decomposed into two parts, and the dimension of CP is two times smaller accordingly ($$n_{cp}= 3 \cdot n_c = 3 \cdot 2=6$$ where $$n_c$$ is the number of clusters). The important thing is that the linear system (14) can be handled by the same methodology applied onto in subsections FETI and Projected Conjugate Gradient Method, so long as each object is relabeled by a tilde.

## HTFETI - Optimal implementation¶

In the following text the methodology will be described on the first cluster and its subdomains 1, 2.

To keep optimal properties of the HTFETI code, the cluster stiffness matrix $$\tilde{\mathbf{K}}_1$$ can not be assembled, but used implicitly. It requires modified routines for factorization and kernel detection. The factorization is obtained by solving $$\tilde{\mathbf{K}}_1\tilde{\mathbf{x}}=\tilde{\mathbf{b}}$$ more detailed written as

(15)$\begin{split} \left( \begin{array}{ll} \mathbf{K}_{1:2} & \mathbf{B}^\top_{0,1:2} \\ \mathbf{B}_{0,1:2} & \mathbf{O} \end{array} \right) \left(\begin{array}{l} \mathbf{x} \\ \boldsymbol{\mu} \end{array}\right) = \left(\begin{array}{l} \mathbf{b} \\ \mathbf{z} \end{array}\right)\end{split}$

with $$\mathbf{K}_{1:2}=\text{diag}(\mathbf{K}_1,~\mathbf{K}_2)$$ and $$\mathbf{B}_{0,1:2}=(\mathbf{B}_{0,1},~\mathbf{B}_{0,2})$$. Notation $$1:2$$ points to the first and last ordinal number of the subdomains on the cluster. So for the second cluster it would be $$3:4$$. The system (15) can be taken as a FETI problem to be solved not iteratively but by a direct solver. After the dualization it reads

$\begin{split}:label:eq:KKTdual_kplus \left( \begin{array}{ll} \mathbf{F}_{0,1:2} & \mathbf{G}_{0,1:2} \\ \mathbf{G}^\top_{0,1:2} & \mathbf{O} \end{array} \right) \left(\begin{array}{l} \boldsymbol{\mu} \\ \boldsymbol{\beta} \end{array}\right) = \left(\begin{array}{l} \mathbf{d}_{0,1:2} \\ \mathbf{e}_{0,1:2} \end{array}\right)\end{split}$

with substitutions

$\begin{split}\begin{array}{lr} \mathbf{F}_{0,1:2}=\mathbf{B}_{0,1:2}\mathbf{K}_{1:2}^+\mathbf{B}_{0,1:2}^\top-\mathbf{z}, & \mathbf{G}_{0,1:2}=-\mathbf{B}_{0,1:2}\mathbf{R}_{1:2}, \\ \mathbf{d}_{0,1:2}=\mathbf{B}_{0,1:2}\mathbf{K}_{1:2}^+\mathbf{b}, & \mathbf{e}_{0,1:2}=-\mathbf{R}_{1:2}^\top \mathbf{b}. \end{array}\end{split}$

To obtain the vector $$\tilde{\mathbf{x}}=(\mathbf{x}^\top,~\boldsymbol{\mu}^\top)^\top$$, both systems , are subsequently solved in three steps

$\begin{split} \begin{array}{l} \boldsymbol{\beta} = \mathbf{S}_{0,1:2}^+ \left(\mathbf{G}^\top_{0,1:2}\mathbf{F}_{0,1:2}^{-1}\mathbf{g}_{0,1:2}-\mathbf{e}_{0,1:2}\right)\\ \boldsymbol{\mu} = \mathbf{F}_{0,1:2}^{-1} \left(\mathbf{g}_{0,1:2}-\mathbf{G}_{0,1:2}^\top\boldsymbol{\beta}\right) \\ \mathbf{x} = \mathbf{K}_{1:2}^+ \left(\mathbf{b}-\mathbf{B}_{0,1:2}^\top\boldsymbol{\mu}\right)+ \mathbf{R}_{1:2}\boldsymbol{\beta} \end{array}\end{split}$

in which singular Schur complement

$\mathbf{S}_{0,1:2}=\mathbf{G}^\top_{0,1:2}\mathbf{F}_{0,1:2}^{-1}\mathbf{G}_{0,1:2}^\top$

appears.

In the HTFETI method, not only do the subdomain matrices $$\mathbf{K}_i$$ have to be factorized, but also $$\mathbf{F}_{0,1:2}$$ and $$\mathbf{S}_{0,1:2}$$ - one pair on each cluster. The dimension of $$\mathbf{F}_{0,1:2}$$ is controlled by the number of LM which glue subdomains of the cluster, and the dimension of $$\mathbf{S}_{0,1:2}$$ is given by the number of subdomains per cluster multiplied by the defect of $$\mathbf{K}_i$$.

To get the kernel $$\tilde{\mathbf{R}}_1$$ of the first cluster, the following term is written

(16)$\begin{split} \left( \begin{array}{ll|l} \mathbf{K}_{1} & \mathbf{O} & \mathbf{B}^\top_{0,1} \\ \mathbf{O} & \mathbf{K}_{2} & \mathbf{B}^\top_{0,2} \\ \hline \mathbf{B}_{0,1} & \mathbf{B}_{0,2} & \mathbf{O} \end{array} \right) \left(\begin{array}{l} \mathbf{R}_1~ \mathbf{O} \\ \mathbf{O}~~ \mathbf{R}_2 \\ \hline \mathbf{O}~~ \mathbf{O} \end{array} \right) \left(\begin{array}{l} \mathbf{H}_1 \\ \mathbf{H}_2 \end{array} \right) = \left(\begin{array}{l} \mathbf{O} \\ \mathbf{O} \\\hline \mathbf{O} \end{array} \right)\end{split}$

or shortly

(17)$\begin{split} \left( \begin{array}{l|l} \mathbf{K}_{1:2} & \mathbf{B}^\top_{0,1:2} \\ \hline \mathbf{B}_{0,1:2} & \mathbf{O} \end{array} \right) \left(\begin{array}{c} \mathbf{R}_{1:2} \\ \hline \mathbf{O} \end{array} \right) \mathbf{H}_{1:2} = \left(\begin{array}{c} \mathbf{O} \\\hline \mathbf{O} \end{array} \right).\end{split}$

where the kernel is given by

$\tilde{\mathbf{R}}_1=(\mathbf{R}_{1:2}^\top,~\mathbf{O})^\top\mathbf{H}_{1:2}.$

Assuming that the subdomian kernels $$\mathbf{R}_1$$ and $$\mathbf{R}_2$$ are already known, the determination of $$\mathbf{H}_{1:2}$$ remains. The first equation

$\mathbf{K}_{1:2}\mathbf{R}_{1:2}\mathbf{H}_{1:2}=\mathbf{O}$

from does not impose any special conditions onto $$\mathbf{H}_{1:2}$$, because the product $$\mathbf{K}_{1:2} \mathbf{R}_{1:2}$$ is already a zero matrix. The second equation reads

(18)$\mathbf{B}_{0,1:2} \mathbf{R}_{1:2} \mathbf{H}_{1:2} = -\mathbf{G}_{0,1:2}^\top \mathbf{H}_{1:2} = \mathbf{O}$

and it says, $$\mathbf{H}_{1:2}$$ is kernel of $$\mathbf{G}_{0,1:2}$$. It can be beneficial if kernels of subdomains stored in $$\mathbf{R}_{1,2}$$ are obtained analytically and from one coordinate system (see, e.g., eq. (3.7) in ). Then the searched kernel is

$\mathbf{H}_{1:2}= \left(\begin{array}{cc} \mathbf{I}_{3,3}& \mathbf{I}_{3,3}\end{array} \right)^\top$

where $$\mathbf{I}_{3,3}\in \mathbb{R}^{3\times3}$$ is an identity matrix. Matrix $$\mathbf{H}_{1:2}$$ is also the kernel of $$\mathbf{S}_{0,1:2}$$, which can be regularized by it and then easily factorized.

Reference

 [1] ESPRESO - Exascale Parallel FETI Solver, http://espreso.it4i.cz.
 [2] Dostal; D. Horak; and R. Kucera: Total FETI-an easier implementable variant of the FETI method for numerical solution of elliptic PDE. Communications in Numerical Methods in Engineering, 22(12):1155–1162, jun 2006.
 [3] Dostal; T. Kozubek; T. Brzobohaty; A. Markopoulos; and O. Vlach.: Scalable TFETI with optional preconditioning by conjugate projector: for transient frictionless contact problems of elasticity. Computer Methods in Applied Mechanics and Engineering, 247-248:37–50, nov 2012.
 [4] Z. Dostal, T. Kozubek, V. Vondrak, T. Brzobohaty, and A. Markopoulos.: Scalable TFETI algorithm for the solution of multibody contact problems of elasticity. International Journal for Numerical Methods in Engineering, pages n/a–n/a, 2009.
 [5] Farhat, J. Mandel, and F.-X. Roux.: Optimal convergence properties of the FETI domain decomposition method. Computer Methods in Applied Mechanics and Engineering, 115:365–385, 1994.
 [6] C. Farhat and F.-X. Roux.: A method of finite element tearing and interconnecting and its parallel solution algorithm. International Journal for Numerical Methods in Engineering, 32(6):1205–1227, oct 1991.
 [7] Gosselet and C. Rey.: Non-overlapping domain decomposition methods in structural mechanics. Archives of Computational Methods in Engineering, 13(4):515–572, 2006.
 [8] Klawonn and O. Rheinbach.: Highly scalable parallel domain decomposition methods with an application to biomechanics. ZAMM, 90(1):5–32, jan 2010.
 [9] Klawonn, M. Lanser, and O. Rheinbach: Toward extremely scalable nonlinear domain decomposition methods for elliptic partial differential equations. SIAM J. Sci. Comput., 37(6):C667–C696, jan 2015.
 [10] Toselli and O. B. Widlund.: Domain Decomposition Methods {textemdash} Algorithms and Theory. Springer-Verlag, 2005.
 [11] Zienkiewicz and R. L. Taylor.: The finite element method. fifth edition. Bautechnik, 79(2):122–123, feb 2002.