Statistiques
| Révision :

root / man / man3 / HPL_pdrpanllT.3

Historique | Voir | Annoter | Télécharger (2,58 ko)

1
.TH HPL_pdrpanllT 3 "September 10, 2008" "HPL 2.0" "HPL Library Functions"
2
.SH NAME
3
HPL_pdrpanllT \- Left-looking recursive panel factorization.
4
.SH SYNOPSIS
5
\fB\&#include "hpl.h"\fR
6
 
7
\fB\&void\fR
8
\fB\&HPL_pdrpanllT(\fR
9
\fB\&HPL_T_panel *\fR
10
\fI\&PANEL\fR,
11
\fB\&const int\fR
12
\fI\&M\fR,
13
\fB\&const int\fR
14
\fI\&N\fR,
15
\fB\&const int\fR
16
\fI\&ICOFF\fR,
17
\fB\&double *\fR
18
\fI\&WORK\fR
19
\fB\&);\fR
20
.SH DESCRIPTION
21
\fB\&HPL_pdrpanllT\fR
22
recursively  factorizes  a panel of columns  using  the
23
recursive Left-looking variant of the one-dimensional algorithm.  The
24
lower  triangular  N0-by-N0  upper block  of  the panel  is stored in
25
transpose form.
26
 
27
Bi-directional  exchange  is  used  to  perform  the  swap::broadcast
28
operations  at once  for one column in the panel.  This  results in a
29
lower number of slightly larger  messages than usual.  On P processes
30
and assuming bi-directional links,  the running time of this function
31
can be approximated by (when N is equal to N0):                      
32
 
33
   N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
34
   N0^2 * ( M - N0/3 ) * gam2-3
35
 
36
where M is the local number of rows of  the panel, lat and bdwth  are
37
the latency and bandwidth of the network for  double  precision  real
38
words,  and  gam2-3  is an estimate of the  Level 2 and Level 3  BLAS
39
rate of execution. The  recursive  algorithm  allows indeed to almost
40
achieve  Level 3 BLAS  performance  in the panel factorization.  On a
41
large  number of modern machines,  this  operation is however latency
42
bound,  meaning  that its cost can  be estimated  by only the latency
43
portion N0 * log_2(P) * lat.  Mono-directional links will double this
44
communication cost.
45
.SH ARGUMENTS
46
.TP 8
47
PANEL   (local input/output)    HPL_T_panel *
48
On entry,  PANEL  points to the data structure containing the
49
panel information.
50
.TP 8
51
M       (local input)           const int
52
On entry,  M specifies the local number of rows of sub(A).
53
.TP 8
54
N       (local input)           const int
55
On entry,  N specifies the local number of columns of sub(A).
56
.TP 8
57
ICOFF   (global input)          const int
58
On entry, ICOFF specifies the row and column offset of sub(A)
59
in A.
60
.TP 8
61
WORK    (local workspace)       double *
62
On entry, WORK  is a workarray of size at least 2*(4+2*N0).
63
.SH SEE ALSO
64
.BR HPL_dlocmax \ (3),
65
.BR HPL_dlocswpN \ (3),
66
.BR HPL_dlocswpT \ (3),
67
.BR HPL_pdmxswp \ (3),
68
.BR HPL_pdpancrN \ (3),
69
.BR HPL_pdpancrT \ (3),
70
.BR HPL_pdpanllN \ (3),
71
.BR HPL_pdpanllT \ (3),
72
.BR HPL_pdpanrlN \ (3),
73
.BR HPL_pdpanrlT \ (3),
74
.BR HPL_pdrpancrN \ (3),
75
.BR HPL_pdrpancrT \ (3),
76
.BR HPL_pdrpanllN \ (3),
77
.BR HPL_pdrpanrlN \ (3),
78
.BR HPL_pdrpanrlT \ (3),
79
.BR HPL_pdfact \ (3).