Code_Aster ®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU Key
:
D9.03.01-A Page
: 1/14
Organization (S): EDF-R & D/SINETICS
Handbook of Descriptif Informatique
D9.03 booklet: -
Document: D9.03.01
Implementation of algorithm FETI
Summary:
One describes here implementation the data-processing of the algorithm of resolution of linear systems FETI. One
takes the notations of the notes of Référence [R6.01.03] [bib1] and one again support also on those of Utilization
[U4.50.01] and of Développement [D4.06.05], [D4.06.07], [D4.06.10], [D4.06.11] and [D4.06.21]. One will find in
this document the simplified flow chart of the process of resolution, in sequential mode as in parallel,
allowing to distinguish its principal logical articulations, its tree of call, the principal variables
accused like their contents. Specificities and the philosophy of parallelism set up are
particularly detailed.
One has on the other hand chooses not to weigh down the talk by mentioning the calls to the routines supervisor, with those
of management of objects JEVEUX, handling of low level (VTDEFS, VTCREB…) and details of the routines
preliminaries with the unfolding of algorithm FETI (ASSMAM, MEACMV, MERESO…).
Handbook of Descriptif Informatique
D9.03 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU Key
:
D9.03.01-A Page
: 2/14
1 Flow chart
simplified
Principal routines
Total function
Specificity of parallelism
Reading of the data of calculation
All the processors carried out
Before
(grid, materials…), same operations until it
order
pretreatments… Input in
level there and thus have access to
calling FETI
possible loops of calculation:
entirety of data JEVEUX
increment of load, not of
known (grid, fields resulting
Newton…
pretreatments…).
Preparation of the data solvor: SD With the processor J are associated them
CRESOL
Main SOLVEUR and slaves.
j1 under-fields, j2… jk. The SD
Main SOLVEUR is built by
each proc. and its .FETS
point that on SD SOLVEUR of
under-fields slaves jk:
.FETS SD SOLVEUR jk
Renumérotation and Idem factorization on
NUMERO
symbolic system, constitution of the SD
NUME_DDL:
Main NUME_DDL and slaves.
.FETN SD NUME_DDL jk
Assemblies of the matrices of rigidity and
Idem on
vectors second local members.
MATR_ASSE/CHAM_NO:
ASSMAM
Filling of
.FETM
SD
SD MATR_ASSE jk
ASSVEC
MATR_ASSE/CHAM_NO
Masters and .FETC SD CHAM_NO jk
slaves.
Factorization of matrices of rigidity
Each proc. J calculates them
PRERES
data relating to sound
local (Ki) + and seeks of theirs
perimeter of under-fields
:
rigid modes Bi.
(Kj) + and B.
K
jk
Resolution via
the algorithm
RESOUD/
ALFETI
FETI itself, to see
[Figure 1-b].
Appear 1-a: Organigramme simplified before ALFETI
Handbook of Descriptif Informatique
D9.03 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU Key
:
D9.03.01-A Page
: 3/14
G:= R B
R B
Known Objets JEVEUX that
FETGGT
Construction of
I
[1 1 K Q Q] of the proc. 0
and of G T
I GI.
0
T
T
E:= F B
F B
and
1
1
K Q
e0 known of the proc 0 and 0 of
FETINL
Calculation of
[
] T
Q
- 1
all.
0
:= G
.
I [G T G
E
I
I
]
of
(
)
+
Construction of 0
R:=
Known that proc. 0.
FETRIN
R
-
.
I (K I) (
T
0
F
R
I
I
)
0
-
G:= Pr =
FETPRJ
(
1
0
I - G G T G
G T
.
I
I
I
I
)
Calculation of
([)] () 0r Connu of all proc.
~
FETSCA/
Calculation of 0
1
-
0
H:= AM Ag.
Known that proc. 0.
FETPRC
Known of all proc.
Calculation of 0
~0
H:= pH and p0=h0.
FETPRJ
Calculation of the local solution with
ground
U known of the proc.
yes
I
Test of stop
each ground under-field
U
and
I
associated, ground
U of the proc. 0.
0
G 0?
rebuilding of total the ground
U.
not
Loop
GCPPC
to see [Figure 1-c].
Appear 1-b: Organigramme simplified in ALFETI (level 1)
Handbook of Descriptif Informatique
D9.03 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU Key
:
D9.03.01-A Page
: 4/14
Iteration K
+
Construction of K
Z =
:
Known that proc. 0.
FETFIV
R K R p.
I (
I)
T
K
I
Calculation of the parameter of descent K.
Known that proc. 0.
DDOT
K +1
K
K
K
FETPRJ
= + p
DAXPY
Updates:
.
k+1 known that proc.
K +1
K
K
K
G
= G - Pz
0, gk+1 of all.
ground
yes
Calculation of the local solution with
U known of the proc.
I
Test of stop?
K 1
+
0
each ground under-field
U
and
I
associated, ground
U of the proc. 0.
G
< RESI_RELA G
rebuilding of total the ground
U.
not
FETSCA/
K +1
- 1
Known that proc. 0.
Calculation of
1
H
:
+
=
K
WFP Ag
.
FETPRC/
FETPRJ
Update of the direction of descent
Known of all.
FETREO
pk+1 (reorthogonalized or not).
Appear 1-c: Flow chart simplified in ALFETI (level 2)
Handbook of Descriptif Informatique
D9.03 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU Key
:
D9.03.01-A Page
: 5/14
2
Detailed flow chart and tree of call
Routine
Routines
Routines
Details
Information
appealing
called
called
relative to
level 1
level 2
processors
Before
Reading of the data of calculation (grid, Tous proc. have
order
materials,
SD_FETI [D4.06.21]…), carried out them
calling FETI
pretreatments…
same
(MECA_
Input in possible loops of calculation:
operations
STATIQUE…)
increment of load, not of Newton…
up to this level
and thus access has
with the entirety of
data
Known JEVEUX
(grid,
fields resulting from
pretreatments…)
.
CRESOL
· Reading of the parameters inherent in
O */O
key word factor SOLVEUR [U4.50.01],
· Creation and initialization of the objects
P/P
related “&FETI.INFO…” (for
monitoring [D4.06.21 §4]) and
SDFETI (1:8)//“.MAILLE.NUMSD”
P/P
(for the routines of assembly
SD SOLVEUR
[D4.06.21 §4]),
Master is
· Creation of the pointer
built by
SOLVEUR_maître.FETS towards
each proc. and
SOLVEUR slaves.
its .FETS
will point that on
SD SOLVEUR
under
fields
slaves jk
Note:
* The code O/NR means that the operation is carried out, or that the variable is known, only proc. Master (O
for Oui) but not of the others proc. (NR for Non). One uses also P (Partiellement) to notify that the carried out one
relate to only the under-fields of the perimeter of the proc. running.
FETMPI
· Distribution of the under-fields by proc. MPI_COMM_SIZE
and determination of the number of proc.
MPI_COMM_RANK
Loop on
CRESO1
· Constitution of SD SOLVEUR [D4.06.11]
P/P
under
under-fields (“slaves”),
fields
· Checks meshs of the model/meshs Into sequential
concerned **
under-fields.
by the proc.
running
End of loop
CRESO1
· Constitution of SD SOLVEUR of
O/O
total field (“main”),
· Checks meshs of the model/meshs Into sequential
under-fields.
Note:
** Into sequential, all the under-fields are concerned with the processor running which is also the processor
Master or proc. 0. In parallel, the proc. running J sees itself allotted a whole of under-fields contiguous: j1,
j2… jk. In the loops on the under-fields, this information is formulated via object JEVEUX
“&FETI.LISTE.SD.MPI” ([D4.06.21 §4]) which filters the indices of loop.
NUMERO
· Constitution of the list of loads
O/O
total with all the model.
Handbook of Descriptif Informatique
D9.03 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU Key
:
D9.03.01-A Page
: 6/14
Routine
Routines
Routines
Details
Information
appealing
called
called
relative to
level 1
level 2
processors
NUMER2
· Constitution of the NUME_DDL [D4.06.07]
O/O
Master and of his pointer .FETN.
The .FETN
will point that on
SD of under
fields
slaves jk
NUMER2 NUEFFE
· Creation of the main NUME_EQUA.
O/O
NUMER2 PROFMA
· One created not main SD STOCKAGE.
O/O
· Checking of the coherence of
O/O
SD_FETI with respect to the model and of
loads (controlled by the key word
VERIF_SDFETI).
FETMPI
· Determination of the number of proc. and of the MPI_COMM_SIZE
row of the proc. running.
MPI_COMM_RANK
Loop on
EXLIM1
· Creation of the LIGREL of the meshs
P/P
under
physiques of the under-field.
fields
concerned with
the proc.
running
EXLIM2
· Constitution of the list of the LIGREL of
P/P
charge (with late meshs) impacting it
under-field,
· Their possible projections on
P/P
several under-fields. Filling
ad hoc of the SD .FELi associated these
projections.
NUMER2
· Constitution of the NUME_DDL slave.
P/P
NUMER2 NUEFFE
· Creation of the NUME_EQUA slave.
P/P
End of loop
NUMER2 PROFMA
· One created SD STOCKAGE slave.
P/P
· Test of the identity of the PROF_CHNO.NUEQ
O/NR
([D4.06.07 §5.3])
ASSMAM/
· Constitution of the MATR_ASSE/CHAM_NO
O/O
ASSVEC
[D4.06.10], [D4.06.05] main and of theirs
Theirs
pointers .FETM/.FETC. One does not build .FETM/.FETC
not the useless MATR_ASSE.VALE.
will point that
on the SD of
under-fields
slaves jk
Loop on
Constitution of the MATR_ASSE/CHAM_NO
P/P
under
slaves while being pressed on the objects
fields
auxiliaries:
concerned with
· SDFETI (1:8)//“.MAILLE.NUMSD”
the proc.
who determines in the loops on
running
total data if the mesh considered
End of loop
interest the under-field,
· .FELi to be able to make the joint
between the classification of meshs or of
late nodes and their local classification
with the under-field (that of the NUME_DDL
slave).
PRERES
· Update of field “DOCU” of
O/O
MATR_ASSE maître.REFA to cross
RESOUD and the recopy of the second member
in vector solution.
Loop on
FETFAC TLDLG2
· Filling of MATR ASSE
P/P
Handbook of Descriptif Informatique
D9.03 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU Key
:
D9.03.01-A Page
: 7/14
Routine
Routines
Routines
Details
Information
appealing
called
called
relative to
level 1
level 2
processors
under
esclaves.VALF of the under-fields
fields
associated the proc. running: (K
,
I) +
concerned with
the proc.
· Calculation of the rigid modes and filling
P/P
running
temporary objects
“&&FETFAC.FETI.MOCR” (modes
rigid) and “&&FETFAC.FETI.INPN”
(index of the pivots quasi-null),
· Checks of the rigid modes and one
conditions of Moore-Penrose (if
P/P
INFO_FETI (6:6) = ' activated You
[U4.50.01 §3.5])
End of loop
FETFAC
· Filling of the objects P/P
MATR_ASSE.FETF, .FETP and .FETR
([D4.06.10 §3]): Bi.
RESOUD
Loop on
· Check that the PROF_CHNO of
P/P
under
MATR_ASSE is identical to that of
fields
second member (for the Master and the SD
concerned
slaves),
by the proc.
· Check that the total MATR_ASSE and its
running
MATR_ASSE slaves indeed were
P/P
End of
factorized.
loop
RESFET
Loop on · Update of the temporary objects &INT
P/P
under
local MATR_ASSE.
fields
concerned
by the proc.
running
End of
loop
ALFETI
· Algorithm FETI itself, to see
following tables [Table 2-2] and
[Table 2-3].
Table 2-1: Detailed flow chart and tree of call before ALFETI
Routine
Routines
Routines
Details
Information
calling
called called level
relative to
E
level 1
2
processors
ALFETI
FETMPI
· Determination of the number of proc. and of the MPI_COMM_SIZE
row of the proc. running.
MPI_COMM_RANK
Loop on
· Initialization of the collections of vectors
P/P
under
dimensioned with the number of DDLs
fields
(physics and late) of each under
concerned
fields: “&&FETI.COLLECTIONR” and
by the proc.
“&&FETI.COLLECTIONL”. They
running
will be used for the matric operations of
End of
size of the local problems (
loop
second is limited to the product stamps
vector of prepacking).
FETING
Loop on
· Constitution of the collection of vectors
P/P
under-fields
dimensioned with the number of Lagranges
concerned with
of interface of the under-fields
proc. running
“&&FETI.COLLECTIONI”. It is used for
End of loop
to make the joint enters classification
of Lagrange of interface in the list
nodes of under-fields
(SDFETI.FETB) and that of PROF CHNO
Handbook of Descriptif Informatique
D9.03 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU Key
:
D9.03.01-A Page
: 8/14
Routine
Routines
Routines
Details
Information
calling
called called level
relative to
E
level 1
2
processors
room resulting from factorization symbolic system.
· Initialization of temporary objects related to
O/O
reorthogonalisation (REORTH,
Loop on
NBREOR…), of temporary vectors
under-fields
(K24IRR, K24LAI…),
concerned with · Creation of temporary objects for
P/P
proc. running
to save calls later
End of loop
JEVEUO: K24REX, K24FIV….
FETMPI
· Reduction then total diffusion of In parallel (tested
object MATR_ASSE maître.FETF for by the presence
that all proc. know how much
from at least 2
rigid modes has under
processors)
field given.
O/O
MPI_ALLREDUCE+
MPI_SUM
FETMPI
· Idem for the object of monitoring In parallel
“&FETI.INFO.STOCKAGE.FVAL”,
O/O
MPI_ALLREDUCE+
· As long as to synchronize one makes the same one
MPI_SUM
thing, without total diffusion, for
O/NR
other objects of monitoring
MPI_REDUCE+
“&FETI.INFO…”.
MPI_SUM
FETGGT
Loop on
· Construction of the rectangular matrix
P/P
under-fields
G:= R B
R B
I
[1 1 K Q Q]
concerned with
proc. running
(NOMGI='&&FETI.GI.R').
End of loop
FETREX
FETGGT FETMPI
· Construction of G
In parallel
I complete by collection
selective towards the proc. 0.
O/NR
ATTENTION, it is here that MPI_GATHERV intervene
constraints: STOCKAGE_GI=' OUI'
obligatory in parallel and distribution of
under-fields in a contiguous way by
proc.
FETGGT
BLAS DDOT
· Construction of the square matrix G T
If proc 0
I GI
(NOMGGT='&&FETI.GITGI.R').
O/NR
FETMON
· Monitoring if INFOFE (9:9) = ' You: sizes
If proc 0
under-fields, profiling of theirs
O/NR
time CPU of assembly, of
factorization…
FETINL
Loop on
· Construction of the vector P/P
under-fields
e0:= [T
T
F B
F B
1
1
K Q Q] T
concerned with
proc. running
(K24ER= `&&FETINL.E.R').
End of loop
FETINL FETMPI
· Construction of the complete e0 by reduction
In parallel
towards the proc. 0.
O/NR
MPI_REDUCE+
MPI_SUM
FETINL
FETREX and
· Calculation of the Lagrange vector of interface
If proc 0
BLAS DAXPY
- 1
0
O/NR
:= G
I [G T G
E
I
I
]
(if
initial
(
)
STOCKAGE_GI
(VLAGI/K24LAI/ZR (IVLAGI)).
= ' OUI'),
LAPACK
DSPTRF/S
Handbook of Descriptif Informatique
D9.03 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU Key
:
D9.03.01-A Page
: 9/14
Routine
Routines
Routines
Details
Information
calling
called called level
relative to
E
level 1
2
processors
FETINL
· Distribution of 0 with all proc.
In parallel
O/O
MPI_BCAST
FETRIN
Loop on
· Calculation of the initial residue P/P
OPTION=1 under-fields
0
R:=
R
+
-
I (K I) (
T
0
F
R
I
I
)
concerned with
proc. running,
(K24IRR= `&&FETI.RESIDU.R'/
BLAS DAXPY,
ZR (IRR))
FETREX,
RLTFR8,
End of loop
FETRIN
FETMPI
· Construction of the complete r0 by reduction
In parallel
OPTION=1
towards the proc. 0.
O/NR
MPI_REDUCE+
MPI_SUM
FETPRJ
BLAS
· Calculation of the initial projected residue
If proc 0
OPTION=1 DGEMV/DCOPY,
0
-
O/NR
G:= Pr = (I - GI [G T G
I
I] 1
0
G T
I
) 0
LAPACK
()
() R
DSPTRS,
(K24IRG= `&&FETI.REPROJ.G'/
FETREX and
ZR (IRG)).
BLAS DAXPY
(if
STOCKAGE_GI
= ' OUI').
FETPRJ
FETMPI
· Distribution of g0 to all proc.
In parallel
O/O
MPI_BCAST
FETSCA
· Scaling of the initial projected residue
O/O
~0
0
G = Ag (K24IR1/ZR (IR1)).
FETPRC
Loop on
· Calculation of packaged projected residue
P/P
under-fields
initial
0
1
- ~0
G:= M G
concerned with
proc. running,
(K24IR2= `&&FETI.VECNBI.AUX2'/
BLAS DAXPY,
ZR (IR2)).
FETREX,
MRMULT,
End of loop
FETPRC FETMPI
· Construction of the 0
G complete by In parallel
O/NR
reduction towards the proc. 0.
MPI_REDUCE+
MPI_SUM
FETSCA
· Scaling of the projected residue
If proc. 0
~
O/NR
packaged initial 0
0
H = Ag
(K24IR3='&&FETI.VECNBI.AUX3'/
ZR (IR3)).
FETPRJ
BLAS
·
0
~
H:= pH
If proc. 0
OPTION=1
Calculation of
0
DGEMV/DCOPY,
O/NR
LAPACK
(K24IRH= `&&FETI.REPCPJ.H'/
DSPTRS,
ZR (IRH)).
FETREX and
BLAS DAXPY
(if
STOCKAGE_GI
= ' OUI').
FETPRJ FETMPI
· Distribution of h0 to all proc.
In parallel
O/O
MPI_BCAST
Handbook of Descriptif Informatique
D9.03 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU Key
:
D9.03.01-A Page
: 10/14
Routine
Routines
Routines
Details
Information
calling
called called level
relative to
E
level 1
2
processors
BLAS
· The variable p0 receives h0
O/O
DCOPY
(K24IRP='&&FETI.DD.P'/ZR (IRP))
BLAS
· Calculation of the numerator of the parameter of
If proc 0
DDOT
0
0
:= G .p
O/NR
descent
0 (
NR
ALPHAN),
· Calculation of the initial standard of the residue
If proc 0
O/NR
BLAS
projected
0
G (ANORM) and of the criterion of stop
DNRM2
0
:=
(
K
RESI_RELA G
EPSIK).
FETMPI
· Distribution of ANORM to all procs and
In parallel
calculation of EPSIK.
O/O
MPI_BCAST
· Preparation of object JEVEUX CRITER.
O/O
FETRIN
OPTION=2
Test of stop if residue
0
G quasi-no one
FETPRJ
(i.e. lower than
If proc. 0
OPTION=2
R8MIEM () ** (2.D0/3.D0)) :
O/NR
1
ground
T
-
T
FETPRJ
· Calculation of
:= [G G
I
I]
0
G R
I
In parallel
FETMPI
(K24ALP='&&FETI.ALPHA.MCR'),
O/O
MPI_BCAST
Loop on
· Distribution of 0 with all procs
under-fields
(variables K24ALP/ZR (IALPHA)).
P/P
concerned with
proc. running,
· Calculation of the local solution
BLAS DAXPY,
ground
U
= K + F - R 0
:
- B
I
(I) (
T
I
I
) ground
FETREX,
I
RLTFR8,
(K24IRR/ZR (IRR)).
Subloops on
P/P
LIGRELs
physics and
· Rebuilding of the CHAM_NO
ground
U
I
late of
solution slave specific to the under-field
P/P
Local CHAM_NO
(CHAMLS/ZR (IVALCS)),
· Rebuilding of the CHAM_NO usol solution
Master associated with the proc. For the nodes
physics, one adds their contribution
divided beforehand by the multiplicity
End of the loops
geometrical of the aforesaid node
(
In parallel
FETMPI
K24VAL/ZR (IVALS)).
O/NR
·
MPI_REDUCE+
Construction of the complete usol by
MPI_SUM
reduction towards the proc. 0.
FETARP FETPRJ
· Test of the definite-positivity of the operator
Into sequential
FETFIV
of interface
PF
ARPACK
IP if
INFO_FETI (7:7) = ' You
DNAUPD/DNEUPD
([U4.50.01 §3.5])
BLAS DCOPY
· Allowance of the large objects related to
If proc 0
reorthogonalisation:
O/NR
K24PSR='&&FETI.PS.REORTHO.R',
K24DDR='&&FETI.DD.REORTHO.R',
K24FIR='&&FETI.FIDD.REORTHO.R'.
Loop
on
Algorithm FETI level 2 to see table
[Table 2-3].
iterations
GCPPC
Table 2-2: Detailed flow chart and tree of call in ALFETI (level 1)
Handbook of Descriptif Informatique
D9.03 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU Key
:
D9.03.01-A Page
: 11/14
Routine
Routines
Routines
Details
Information
appealing called
called
relative to
level 1
level 2
processors
ALFETI
BLAS
· If reorthogonalisation, storage of
If proc 0
DCOPY
direction of descent km No in K24DDR.
O/NR
Loop on
FETFIV
Loop on · Calculation of the result of operator FETI
P/P
under-fields
of interface on the direction of descent
iterations of
concerned with
K
Z =
:
R K +
R p
I (
I)
T
K
GCPPC
the proc. running,
I
BLAS DAXPY,
(K24IRZ= `&&FETI.FIDD.Z'/ZR (IRZ))
FETREX,
RLTFR8,
End of loop
FETFIV FETMPI
· Construction of the complete zk by reduction
In parallel
towards the proc. 0.
O/NR
MPI_REDUCE+
MPI_SUM
BLAS
· If reorthogonalisation, storage of zk
If proc 0
DCOPY
in K24FIR.
O/NR
BLAS
DDOT
· Calculation of the denominator of the parameter of
If proc 0
descent running K
K
K
:= Z p
. (
O/NR
D
ALPHAD),
· Calculation of the parameter of descent
Idem
K
running K
NR
=
:
(ALPHA).
K
D
· If reorthogonalisation, storage of
If proc 0
ALPHAD in K24PSR.
O/NR
FETTOR FETPRJ
· Test of orthogonalities of the GCPPC if
If proc 0
BLAS DDOT,
INFO_FETI (8:8) = ' You
O/NR
DCOPY
([U4.50.01 §3.5]).
BLAS
· Update of the Lagrange vector
If proc 0
DAXPY
K +
K
K
1 = + p
O/NR
of interface current
K
(K24LAI/ZR (IVLAGI)),
· Calculation of projected intermediate K
K
R = Pz
If proc 0
FETPRJ
1
O/NR
OPTION=1
(K24IR1='&&FETI.VECNBI.AUX1'/
ZR (IR1)),
· Update of the vector projected residue
If proc 0
DAXPY
K +1
K
K K
G
= G - R (
1
ZR (IRG)).
O/NR
FETMPI
· Distribution of gk+1 to all proc.
In parallel
O/O
MPI_BCAST
BLAS
· Calculation of the standard of the projected residue
O/O
DNRM2
K +1
G
(ANORM).
Test of stop if
K +1
G
<:
O/O
K
FETRIN
Loop on · Recalculation of the residue with the vector
P/P
OPTION=1 under-fields
of interface solution
concerned with
ground
R
= R K F R
I (
I) + (
T
K 1
:
+
-
I
I
)
the proc. running,
BLAS DAXPY,
(K24IRR/ZR (IRR)),
FETREX,
RLTFR8,
End of loop
FETRIN
FETMPI
· Construction of the complete rsol by In parallel
OPTION=1
reduction towards the proc. 0,
O/NR
MPI_REDUCE+
Handbook of Descriptif Informatique
D9.03 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU Key
:
D9.03.01-A Page
: 12/14
Routine
Routines
Routines
Details
Information
appealing called
called
relative to
level 1
level 2
processors
MPI_SUM
Calculation of ground and rebuilding of the CHAM_NO
FETRIN
FETPRJ
solutions Master and slaves as in Cf. test of stop of
OPTION=2
OPTION=2
test of stop of table [Table 2-2].
2.b.
FETSCA
· Scaling of the projected residue running
O/O
~k+1
+1
G
=
K
Ag
(K24IR1/ZR (IR1)).
FETPRC
Loop on · Calculation of packaged projected residue
P/P
under-fields
running
K +1
1
- ~ 1
G
:
+
=
K
M G
concerned with
the proc. running,
(K24IR2= `&&FETI.VECNBI.AUX2'/
BLAS DAXPY,
ZR (IR2)).
FETREX,
MRMULT,
End of loop
FETPRC FETMPI
· Construction of the k+1
G
complete by
In parallel
O/NR
reduction towards the proc. 0.
MPI_REDUCE+
MPI_SUM
FETSCA
· Scaling of the projected residue
If proc. 0
~
O/NR
packaged running K +1
+1
H
=
K
Ag
(K24IR3='&&FETI.VECNBI.AUX3'/
ZR (IR3)).
FETPRJ
·
K +1
~ +
H
=
K
PH
If proc. 0
OPTION=1
Calculation of projected running
1
O/NR
(K24IRH/ZR (IRH)).
FETREO
BLAS DDOT, · Update of the direction of descent
If proc. 0
DAXPY,
current
pk+1
(ZR (IRP)) in
O/NR
DCOPY.
réorthogonalisant, or not, by report/ratio
with the preceding directions,
· Calculation of the numerator of the parameter of
If proc. 0
descent current
O/NR
K 1
+
K +1
K 1
:= G. +
p
(
NR
ALPHAN).
FETREO FETMPI
· Distribution of ZR (IRP) to all proc.
In parallel
O/O
MPI_BCAST
Fine loops
Cleaning objects JEVEUX following option and
GCPPC
number of proc.
Table 2-3: Detailed flow chart and tree of call in ALFETI (level 2)
Handbook of Descriptif Informatique
D9.03 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU Key
:
D9.03.01-A Page
: 13/14
3
Installation of parallelism
First of all, algorithm FETI was coded into sequential then this establishment was adapted for
to support a parallelism by sending of message in MPI-1. Indeed, the priority was to measure
impact of such a solvor multidomaine on the architecture and the SD of the code, to limit them
consequences (legibility, effectiveness, maintainability) and to secure of sound correct operation on
standard cases. Moreover, for many authors, such a solvor appears often very effective (in
CPU and in occupation memory), even into sequential, when one goes up in DDL (cf [bib2]).
Then, the strategy of parallelization was as follows:
· To the main line operator calling solvor FETI (MECA_STATIQUE…), all them
processors carries out the same sequence of operations and thus know the same objects
JEVEUX: grid, materials, fields resulting from pretreatments, SD FETI… It is relatively
sub-optimal, but taking into account the architecture of the code and its current use, it is
the only possible option. It has however the merit not to impact the code
sequential and, when the pretreatments are compared to the solvor, not very greedy in
CPU and in memory, it is also often the strategy retained by the developers of codes
parallels.
· Once in the main line operator, one will direct the operations carried out jointly
by the processors by taring the volume of data which is affected for them. And this, of
preparation of the data solvor, with numerical factorizations symbolic systems and, while passing
by the assembly runs (of the matrix and contributions to the second members) and
of course the algorithm of resolution itself. That will be done very simply, without
sending of particular message, via the object “&&FETI.LISTE.SD.MPI” which will filter the loops
on the under-fields:
CAL JEVEUO (“FETI.LISTE.SD.MPI”, “It, ILIMPI)
C 50 I=1, NBSD <boucle on the sous-domaines>
IF (ZI (ILIMPI+I) .EQ.1) THEN
…. <on carries out the continuation of expected instructions that if it
under-field is contained in the perimeter of the proc. courant>
ENDIF
50 CONTINUE
Concerning large usual objects JEVEUX, each proc. thus will build only them
data which it needs: Main SD SOLVEUR and those slaves depending on the perimeter of
processor running and the same thing for the NUME_DDL, the MATR_ASSE, the CHAM_NO. Par
against, the data of small volume, are they calculated by all the proc. because they
often allow to direct calculation and it is of course important that all the procs. make
the same software advance.
On the other hand, the NUME_DDL of each under-field being known only of its processor of greeting, them
sendings of message are done with vectors of sizes homogeneous: the number of DDL of interface with
run of the algorithm or that of the number of total DDL at the time of the final phase of rebuilding.
The main processor manages the stages of reorthogonalisation and projection and their (potentially)
large associated objects JEVEUX.
The cost of communication is roughly speaking:
Initialization: 3 MPI_REDUCE (size nbi (nbi are the number of Lagrange of interface,
i.e. size of problem FETI to be solved)) + 4 MPI_BCAST (size nbi) + MPI_GATHERV
With each iteration of the GCPPC: 2 MPI_REDUCE (size nbi) + 2 MPI_BCAST (size nbi)
Final rebuilding: 2 MPI_REDUCE (size nbi) + 2 MPI_BCAST (size nbi) +
MPI_REDUCE (size nbddl (nbddl is the total number of unknown
(physics and late) of the problem))
Handbook of Descriptif Informatique
D9.03 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
Implementation of algorithm FETI
Date:
15/09/05
Author (S):
O. BOITEAU Key
:
D9.03.01-A Page
: 14/14
Rather than of the loops of communications points at points between the processors slaves and the Master
(MPI_SEND/RECV), one retained in a first approach of the collective communications
(MPI_REDUCE…) who encapsulate the first and manage in manner transparent the problems of
synchronization and of buffering. That ensures a better legibility, maintainability and portability but, has
contrario, one cannot optimize them by overlapping calculations and the communications, by limiting them
latency time or the buffering.
However, within sight of current software architecture, it seems that these optimizations are not if
promising that that, because they are very dependant on the configuration machine, of that of
network, of the card network, implementation MPI and type of problem. Gains would be without
rather doubt to seek the with dimensions one of a purely parallel implementation of the algorithm (without
vision proc. Master/proc. slaves) where the exchanges of messages would be limited to the neighbors of
under-fields and on more reduced floods of data.
4 Bibliography
[1]
O. BOITEAU: Decomposition of field and parallelism in mechanics of the structures: State
art and benchmark for an establishment reasoned in Code_Aster. Note intern
EDF-R & D HI-23/03/009 (2003).
[2]
O. BOITEAU: Management report 2004 for UMR EDF-CNRS-LaMSID. Report
intern EDF-R & D CR-I23/2005/006 (2005).
Handbook of Descriptif Informatique
D9.03 booklet: -
HT-66/05/003/A
Outline document