root / man / man3 / HPL_pdfact.3
Historique | Voir | Annoter | Télécharger (2,46 ko)
1 |
.TH HPL_pdfact 3 "September 10, 2008" "HPL 2.0" "HPL Library Functions" |
---|---|
2 |
.SH NAME |
3 |
HPL_pdfact \- recursive panel factorization. |
4 |
.SH SYNOPSIS |
5 |
\fB\&#include "hpl.h"\fR |
6 |
|
7 |
\fB\&void\fR |
8 |
\fB\&HPL_pdfact(\fR |
9 |
\fB\&HPL_T_panel *\fR |
10 |
\fI\&PANEL\fR |
11 |
\fB\&);\fR |
12 |
.SH DESCRIPTION |
13 |
\fB\&HPL_pdfact\fR |
14 |
recursively factorizes a 1-dimensional panel of columns. |
15 |
The RPFACT function pointer specifies the recursive algorithm to be |
16 |
used, either Crout, Left- or Right looking. NBMIN allows to vary the |
17 |
recursive stopping criterium in terms of the number of columns in the |
18 |
panel, and NDIV allow to specify the number of subpanels each panel |
19 |
should be divided into. Usuallly a value of 2 will be chosen. Finally |
20 |
PFACT is a function pointer specifying the non-recursive algorithm to |
21 |
to be used on at most NBMIN columns. One can also choose here between |
22 |
Crout, Left- or Right looking. Empirical tests seem to indicate that |
23 |
values of 4 or 8 for NBMIN give the best results. |
24 |
|
25 |
Bi-directional exchange is used to perform the swap::broadcast |
26 |
operations at once for one column in the panel. This results in a |
27 |
lower number of slightly larger messages than usual. On P processes |
28 |
and assuming bi-directional links, the running time of this function |
29 |
can be approximated by (when N is equal to N0): |
30 |
|
31 |
N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + |
32 |
N0^2 * ( M - N0/3 ) * gam2-3 |
33 |
|
34 |
where M is the local number of rows of the panel, lat and bdwth are |
35 |
the latency and bandwidth of the network for double precision real |
36 |
words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS |
37 |
rate of execution. The recursive algorithm allows indeed to almost |
38 |
achieve Level 3 BLAS performance in the panel factorization. On a |
39 |
large number of modern machines, this operation is however latency |
40 |
bound, meaning that its cost can be estimated by only the latency |
41 |
portion N0 * log_2(P) * lat. Mono-directional links will double this |
42 |
communication cost. |
43 |
.SH ARGUMENTS |
44 |
.TP 8 |
45 |
PANEL (local input/output) HPL_T_panel * |
46 |
On entry, PANEL points to the data structure containing the |
47 |
panel information. |
48 |
.SH SEE ALSO |
49 |
.BR HPL_dlocmax \ (3), |
50 |
.BR HPL_dlocswpN \ (3), |
51 |
.BR HPL_dlocswpT \ (3), |
52 |
.BR HPL_pdmxswp \ (3), |
53 |
.BR HPL_pdpancrN \ (3), |
54 |
.BR HPL_pdpancrT \ (3), |
55 |
.BR HPL_pdpanllN \ (3), |
56 |
.BR HPL_pdpanllT \ (3), |
57 |
.BR HPL_pdpanrlN \ (3), |
58 |
.BR HPL_pdpanrlT \ (3), |
59 |
.BR HPL_pdrpancrN \ (3), |
60 |
.BR HPL_pdrpancrT \ (3), |
61 |
.BR HPL_pdrpanllN \ (3), |
62 |
.BR HPL_pdrpanllT \ (3), |
63 |
.BR HPL_pdrpanrlN \ (3), |
64 |
.BR HPL_pdrpanrlT \ (3). |