Statistiques
| Révision :

root / man / man3 / HPL_pdfact.3

Historique | Voir | Annoter | Télécharger (2,46 ko)

1
.TH HPL_pdfact 3 "September 10, 2008" "HPL 2.0" "HPL Library Functions"
2
.SH NAME
3
HPL_pdfact \- recursive panel factorization.
4
.SH SYNOPSIS
5
\fB\&#include "hpl.h"\fR
6
 
7
\fB\&void\fR
8
\fB\&HPL_pdfact(\fR
9
\fB\&HPL_T_panel *\fR
10
\fI\&PANEL\fR
11
\fB\&);\fR
12
.SH DESCRIPTION
13
\fB\&HPL_pdfact\fR
14
recursively factorizes a  1-dimensional  panel of columns.
15
The  RPFACT  function pointer specifies the recursive algorithm to be
16
used, either Crout, Left- or Right looking.  NBMIN allows to vary the
17
recursive stopping criterium in terms of the number of columns in the
18
panel, and  NDIV  allow to specify the number of subpanels each panel
19
should be divided into. Usuallly a value of 2 will be chosen. Finally
20
PFACT is a function pointer specifying the non-recursive algorithm to
21
to be used on at most NBMIN columns. One can also choose here between
22
Crout, Left- or Right looking.  Empirical tests seem to indicate that
23
values of 4 or 8 for NBMIN give the best results.
24
 
25
Bi-directional  exchange  is  used  to  perform  the  swap::broadcast
26
operations  at once  for one column in the panel.  This  results in a
27
lower number of slightly larger  messages than usual.  On P processes
28
and assuming bi-directional links,  the running time of this function
29
can be approximated by (when N is equal to N0):                      
30
 
31
   N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
32
   N0^2 * ( M - N0/3 ) * gam2-3
33
 
34
where M is the local number of rows of  the panel, lat and bdwth  are
35
the latency and bandwidth of the network for  double  precision  real
36
words, and  gam2-3  is  an estimate of the  Level 2 and Level 3  BLAS
37
rate of execution. The  recursive  algorithm  allows indeed to almost
38
achieve  Level 3 BLAS  performance  in the panel factorization.  On a
39
large  number of modern machines,  this  operation is however latency
40
bound,  meaning  that its cost can  be estimated  by only the latency
41
portion N0 * log_2(P) * lat.  Mono-directional links will double this
42
communication cost.
43
.SH ARGUMENTS
44
.TP 8
45
PANEL   (local input/output)    HPL_T_panel *
46
On entry,  PANEL  points to the data structure containing the
47
panel information.
48
.SH SEE ALSO
49
.BR HPL_dlocmax \ (3),
50
.BR HPL_dlocswpN \ (3),
51
.BR HPL_dlocswpT \ (3),
52
.BR HPL_pdmxswp \ (3),
53
.BR HPL_pdpancrN \ (3),
54
.BR HPL_pdpancrT \ (3),
55
.BR HPL_pdpanllN \ (3),
56
.BR HPL_pdpanllT \ (3),
57
.BR HPL_pdpanrlN \ (3),
58
.BR HPL_pdpanrlT \ (3),
59
.BR HPL_pdrpancrN \ (3),
60
.BR HPL_pdrpancrT \ (3),
61
.BR HPL_pdrpanllN \ (3),
62
.BR HPL_pdrpanllT \ (3),
63
.BR HPL_pdrpanrlN \ (3),
64
.BR HPL_pdrpanrlT \ (3).