root / man / man3 / HPL_pdrpancrN.3
Historique | Voir | Annoter | Télécharger (2,63 ko)
1 |
.TH HPL_pdrpancrN 3 "September 10, 2008" "HPL 2.0" "HPL Library Functions" |
---|---|
2 |
.SH NAME |
3 |
HPL_pdrpancrN \- Crout recursive panel factorization. |
4 |
.SH SYNOPSIS |
5 |
\fB\&#include "hpl.h"\fR |
6 |
|
7 |
\fB\&void\fR |
8 |
\fB\&HPL_pdrpancrN(\fR |
9 |
\fB\&HPL_T_panel *\fR |
10 |
\fI\&PANEL\fR, |
11 |
\fB\&const int\fR |
12 |
\fI\&M\fR, |
13 |
\fB\&const int\fR |
14 |
\fI\&N\fR, |
15 |
\fB\&const int\fR |
16 |
\fI\&ICOFF\fR, |
17 |
\fB\&double *\fR |
18 |
\fI\&WORK\fR |
19 |
\fB\&);\fR |
20 |
.SH DESCRIPTION |
21 |
\fB\&HPL_pdrpancrN\fR |
22 |
HPL_pdrpancrN recursively factorizes a panel of columns using the |
23 |
recursive Crout variant of the usual one-dimensional algorithm. The |
24 |
lower triangular N0-by-N0 upper block of the panel is stored in |
25 |
no-transpose form (i.e. just like the input matrix itself). |
26 |
|
27 |
Bi-directional exchange is used to perform the swap::broadcast |
28 |
operations at once for one column in the panel. This results in a |
29 |
lower number of slightly larger messages than usual. On P processes |
30 |
and assuming bi-directional links, the running time of this function |
31 |
can be approximated by (when N is equal to N0): |
32 |
|
33 |
N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + |
34 |
N0^2 * ( M - N0/3 ) * gam2-3 |
35 |
|
36 |
where M is the local number of rows of the panel, lat and bdwth are |
37 |
the latency and bandwidth of the network for double precision real |
38 |
words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS |
39 |
rate of execution. The recursive algorithm allows indeed to almost |
40 |
achieve Level 3 BLAS performance in the panel factorization. On a |
41 |
large number of modern machines, this operation is however latency |
42 |
bound, meaning that its cost can be estimated by only the latency |
43 |
portion N0 * log_2(P) * lat. Mono-directional links will double this |
44 |
communication cost. |
45 |
.SH ARGUMENTS |
46 |
.TP 8 |
47 |
PANEL (local input/output) HPL_T_panel * |
48 |
On entry, PANEL points to the data structure containing the |
49 |
panel information. |
50 |
.TP 8 |
51 |
M (local input) const int |
52 |
On entry, M specifies the local number of rows of sub(A). |
53 |
.TP 8 |
54 |
N (local input) const int |
55 |
On entry, N specifies the local number of columns of sub(A). |
56 |
.TP 8 |
57 |
ICOFF (global input) const int |
58 |
On entry, ICOFF specifies the row and column offset of sub(A) |
59 |
in A. |
60 |
.TP 8 |
61 |
WORK (local workspace) double * |
62 |
On entry, WORK is a workarray of size at least 2*(4+2*N0). |
63 |
.SH SEE ALSO |
64 |
.BR HPL_dlocmax \ (3), |
65 |
.BR HPL_dlocswpN \ (3), |
66 |
.BR HPL_dlocswpT \ (3), |
67 |
.BR HPL_pdmxswp \ (3), |
68 |
.BR HPL_pdpancrN \ (3), |
69 |
.BR HPL_pdpancrT \ (3), |
70 |
.BR HPL_pdpanllN \ (3), |
71 |
.BR HPL_pdpanllT \ (3), |
72 |
.BR HPL_pdpanrlN \ (3), |
73 |
.BR HPL_pdpanrlT \ (3), |
74 |
.BR HPL_pdrpancrT \ (3), |
75 |
.BR HPL_pdrpanllN \ (3), |
76 |
.BR HPL_pdrpanllT \ (3), |
77 |
.BR HPL_pdrpanrlN \ (3), |
78 |
.BR HPL_pdrpanrlT \ (3), |
79 |
.BR HPL_pdfact \ (3). |