Code_Aster ®
Version
8.1
Titrate:
To measure performances (CPU) on Alphaserver or Linux
Date:
02/11/05
Author (S):
Mr. ABBAS, J. Key PELLET
:
D1.05.01-B Page
: 1/6
Organization (S): EDF-R & D/AMA
Handbook of Descriptif Informatique
D1.05 booklet: -
D1.05.01 document
To measure performances (CPU) on AlphaServer
or on Linux
Summary:
There are tools making it possible to trace the times CPU used (profiling) in Code_Aster.
On AlphaServer, these tools do not require a recompiling of the Aster sources. One uses for that the tool
atom. The disadvantage of this tool (specific alphaserver) is that the instrumentation of executable involves
overcosts of execution which can be very important (up to 10 times the original cost). Under these conditions, it is
difficult to be sure relevance of measurement.
On Linux, one uses the traditional method: one recompile all sources with the option “- pg” and one use
the tool gprof. The overcost of the instrumentation is negligible.
Handbook of Descriptif Informatique
D1.05 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
To measure performances (CPU) on Alphaserver or Linux
Date:
02/11/05
Author (S):
Mr. ABBAS, J. Key PELLET
:
D1.05.01-B Page
: 2/6
1 On
Alphaserver
1.1
To prepare the profiling
One works from the executable one that one intends to use to launch his Aster study:
· Executable native: To recopy the executable aster on a local repertory which belongs to you on
server (the executable native is in/aster/v7/NEW7/on the server and names asterd
or asteru in debug mode or not).
· Executable private: To prepare your overload like usually with ASTK or run_aster and
build your executable.
It is then necessary to modify the executable one using the tool atom.
On your repertory containing executable Aster which you want to profile:
atom - tool hiprof votre_executable
The program will create new executable named <votre_executable.hiprof>
1.2
To make the profiling
For ASTK or run_aster it is necessary to use the new executable one in overload. It is necessary imperatively
to modify the script of launching of Aster bus during the execution in profiling, the file
<votre_executable.hiout> will be created in the temporary repertory of calculation. It is thus needed
to copy in the adequate repertory.
For ASTK:
1) Prepare the study (file, overloads of the catalogs, new executable “profiled”, bases,
time, memory and options various).
2) Add script btc in RESULTAT in miter SURCHARGE.
3) Launch calculation. Calculation will not be carried out (one limps of dialog informs you) but
script (btc) will be created.
4) Modify script btc by publishing it and by adding the following line at the end:
CP votre_executable.hiout/chez_vous/votre_executable.hiout
Take guard! For any modification in the profile of execution (in particular time and memory), it
is essential to recreate the btc and to modify it.
After execution, one finds oneself with two files:
votre_executable.hiout
votre_executable.hiprof
These two files must be in the same repertory. One carries out gprof then while redirecting
standard output:
gprof votre_executable votre_executable.hiout > ResultatProfil
You have from now on a file <ResultatProfil> which is the result of the analysis.
For the possible options, to make a man gprof. Some useful options:
gprof - has
Avoid the display of the static functions, in particular the calls systems which weigh down it
file
gprof - has - F jeveuo_
Limit display with the function designee
Caution:
For a routine FORTRAN, imperatively add a _ (underscore) at the end of the name of
routine and remove the extension .f
Handbook of Descriptif Informatique
D1.05 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
To measure performances (CPU) on Alphaserver or Linux
Date:
02/11/05
Author (S):
Mr. ABBAS, J. Key PELLET
:
D1.05.01-B Page
: 3/6
1.3
To strip the results of the profiling
By defect, the file is heavy. It is possible to limit the display of the infos while playing with the options
of gprof. “Times systems” are indicated in the form of a number of instructions used.
One will detail a little, while starting with the end of the file:
***************************************************************************
Index by function name
[401] PyArg_Parse [591] cftabl_ [1000] proc_at_0x1213acb50
[212] PyArg_ParseTuple [84] cftyli_ [660] proc_at_0x1213ad470
[1137] PyArg_ParseTupleAnd [310] cgmacy_ [453] proc_at_0x1213ad560
[1605] PyBuffer_FromObject [79] charme_ [680] proc_at_0x1213aeac0
[1256] PyCFunction_Fini [476] chlici_ [1221] proc_at_0x1213aedc0
[531] PyCFunction_New [190] chloet_ [217] proc_at_0x1213b 18e0
[1549] PyCObject_AsVoidPtr [226] chmano_ [629] proc_at_0x1213b1e00Y
Each function called during the execution is located by a number between hook.
Just with the top:
***************************************************************************
granularity: instructions; units: inst' S; total: 201924201580.70 inst' S
<A> <B> <C> <D> <E> <F> <G>
49.6 100384307222 100384307222 161 623505013 623596299 tldlr8_ [16]
31.0 163144941823 62760634601 506 124032874 124101882 rldlr8_ [17]
This table summarizes the most frequent calls.
COLONNE <A>: percentage of the number of instructions carried out by this function compared to
total of the execution.
COLONNE <B>: a number of instructions cumulated by this function and those which precede.
COLONNE <C>: a number of instructions for this function.
COLONNE <D>: a many calls have this function
COLONNE <E>: relationship between the column <B> and the column <D> (an average number of instructions by
call of the function)
COLONNE <F>: numbers means of instruction by call of the function and of its
descendants.
COLONNE <G>: name of the function and its reference number (between hooks).
In this example, the function tldlr8 took 49.4% of the total of calculation while being called 161 times.
***************************************************************************
Lastly, at the beginning of the file, we have the tree of complete call. It will be sorted by command of call (one
start with the hand and one goes down) or by a function (see the options of gprof).
***************************************************************************
Let us take the example of tldlr8:
<A> <B> <C> <D> <E> <F>
100263313681.76 14679301.29 161/161 tldlgg_ [15]
[16] 49.7 100263313681.76 14679301.29 161 tldlr8_ [16]
3129121.03 6207534.02 4485/30537 __upcUpcall [352]
35974.59 2749927.50 522/195235 jelibe_ [65]
192341.36 1770419.18 1005/775659 jeveuo_ [56]
47302.73 140745.02 161/202579 jedema_ [102]
18938.92 126525.05 322/63148 jeexin_ [196]
27722.26 85430.33 94/49118 jeecra_ [154]
17033.41 67779.29 94/13206 jecreo_ [257]
45068.75 84.88 1044/1075446 jexnum_ [163]
13618.68 2023.63 161/202581 jemarq_ [205]
1710.66 0.00 161/3481 infniv_ [853]
Handbook of Descriptif Informatique
D1.05 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
To measure performances (CPU) on Alphaserver or Linux
Date:
02/11/05
Author (S):
Mr. ABBAS, J. Key PELLET
:
D1.05.01-B Page
: 4/6
One locates the instruction of the tree of call by the number between hooks on the left. Here, the number [16]
indicate the function tldlr8_ (as indicated in the end of the file for example).
It is the function-reference (the node of the tree).
The lines with the top are the appealing ones of this function (they are the function-parents), those in
below are the functions called (they are the function-children).
Each function has two principal digits: the number of instructions carried out in itself
(“final” instruction of FORTRAN) and numbers it instructions carried out in the functions
children.
Function-parent
Function-parent
…
Function-reference
Function-child
Function-child
Function-child
Function-child
…
For the function-reference:
COLONNE <A>: number of location of the function-reference.
COLONNE <B>: figure 49.7 is the percentage of the number of instructions carried out by this
function-reference compared to the total of the execution (idem preceding table)
COLONNE <C>: a number of instructions for the function-reference itself.
COLONNE <D>: a number of instructions for the function-children of the function-reference.
COLONNE <E>: a number of times or the function was called
COLONNE <F>: name of the function-reference
For the function-parents and the function-children:
COLONNE <A>: vacuum
COLONNE <B>: vacuum
COLONNE <C>: a number of instructions for the function itself.
COLONNE <D>: a number of instructions for the descendants of the function
COLONNE <E>: give two digits has/B whose direction varies according to the type of function (relative or
child compared to the function reference):
· For the function-parents (above the function reference) has/b:
<a> is the number of times where the function-reference was called by this function-parent by
report/ratio with the total number <b> of calls of the function-reference.
· For the function-children (below the function reference) has/b:
<a> is the number of times where the function-child was called by the function-reference by
report/ratio with the total number <b> of calls of the function-child.
COLONNE <F>: name of the function
Note:
· If the number of instructions for the descendants of a function is worth zero, it is that the function
considered no other calls any. One is “with the end” of the tree, it has only calls there
Basic FORTRAN in the function. (it is the case of infniv for example)
· For a given function-reference, if one makes the sum of the <a> in the column <E> of
functions parents, one obtains the total number of calls of the function reference.
· For a given function-reference, if one makes the sum of the columns <C> and <D> of its
function-children, one obtains the figure of the column <D> of the function-reference.
Analyze example
In the example presented, the function tldlr8 is expensive since with it-only, it represents close to
half of the total number of instructions of the execution. It is also seen that they are its clean
instructions which take time and not the call to his/her function-children (the relationship between the two
reached 1000). As only the function tldlgg calls tldlr8, it is necessary to look at the tree of call for
this function. It is seen whereas it is the algorithm of contact/friction (fropgd) which is more
glouton (the 2/3 of the calls to tldlgg are made by the algorithm of contact).
Handbook of Descriptif Informatique
D1.05 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
To measure performances (CPU) on Alphaserver or Linux
Date:
02/11/05
Author (S):
Mr. ABBAS, J. Key PELLET
:
D1.05.01-B Page
: 5/6
2 On
Linux
2.1 Instrumentation
with
f77 - pg (or DC - pg)
On the machine Linux/Rocks clpaster (cluster of PC of department AMA), the problem of
integral recompiling of the sources is less crucial than on the alphaserver: recompiler can
entirely Aster in less than 30 minutes “elapse”.
To carry out this recompiling with Astk, it is necessary:
· “to overload” all the sources (F77 and C). To save time, one can concaténer them
F77 sources in “packages” (300 routines for example).
· to modify the file “config.txt” to add the option” - pg " on the 5 following lines:
OPTL | f90 | ? | - v - pg
OPTC_D | DC | ? | - C - G - pg - DP_LINUX
OPTC_O | DC | ? | - C - pg - DP_LINUX
OPTF_D | f90 | ? | - C - G - pg - I/opt/mpich2- 1.0.1/include
OPTF_O | f90 | ? | - C - O2 - pg - I/opt/mpich2- 1.0.1/include
The file config.txt thus modified makes it possible to instrument the code in mode “debug” and “nodebug”.
The mode “nodebug” is a priori preferable to measure “the true” performances of the code. In
revenge, the mode “debug” is necessary if one wants to know the most consuming lines.
I unfortunately observed an unexplainable problem in mode “debug”: the result of the profiling
indicated links of incoming call routines which did not exist! One can however hope that this
anomaly entirely does not invalidate the remainder of measurement.
As example, I profiled the test ssnv506c and I obtained the following total results:
·
in nodebug mode without instrumentation: 138s
·
in nodebug mode with instrumentation: 139s
·
in debug mode without instrumentation
: 218s
·
in debug mode with instrumentation
: 228s
It is noted that the instrumentation has a negligible cost CPU.
2.2
Execution of Code instrumented with Astk
Once this made instrumentation, it should be carried out the study that one wants “to profile” with the executable one
that one has just produced. The problem is that the execution of the study produces a file (called
gmon.out) in the temporary repertory of execution. This file is thus lost at the end of the execution if
one does not take precautions.
To preserve the invaluable file gmon.out, it is necessary to use Astk in interactive and to click the button
“to launch pre” (instead of traditional “the throw run”). This option of Astk makes it possible to prepare
environment of execution. One places oneself then in the prepared repertory and one “launches” Aster
manually. It is about the same “easy way” as for the use of a debugguor.
Handbook of Descriptif Informatique
D1.05 booklet: -
HT-66/05/003/A
Code_Aster ®
Version
8.1
Titrate:
To measure performances (CPU) on Alphaserver or Linux
Date:
02/11/05
Author (S):
Mr. ABBAS, J. Key PELLET
:
D1.05.01-B Page
: 6/6
2.3
Analysis of the results
Once the study carried out and the file “gmon.out” recovered, one can analyze this file with
order:
gprof mon_executable gmon.out > listing
The interpretation of the file obtained (listing) is the same one as that described with [§1.3]. Excellent
document describing all the process of profiling is that written by Jay Fenlason and Richard Stallman:
“Gnu gprof The GNU to profile”. One easily finds it on Web.
Note:
Even if one recompile all sources of Aster, “depth” of the analysis of the performances
stop with the libraries which one uses with the edition of the links and which were not compiled with
“ pg”. It is for example the case of routines BLAS. The time spent in these libraries
cannot be attached to the routines of Aster which call them. This defect can be important, by
example, if one wants to measure the performances of solveurs MUMPS or MULT_FRONT because one
most of spent time is in routines BLAS.
Handbook of Descriptif Informatique
D1.05 booklet: -
HT-66/05/003/A
Outline document