Statistiques
| Révision :

root / man / man3 / HPL_pdpanllT.3

Historique | Voir | Annoter | Télécharger (2,96 ko)

1 1 equemene
.TH HPL_pdpanllT 3 "September 10, 2008" "HPL 2.0" "HPL Library Functions"
2 1 equemene
.SH NAME
3 1 equemene
HPL_pdpanllT \- Left-looking panel factorization.
4 1 equemene
.SH SYNOPSIS
5 1 equemene
\fB\&#include "hpl.h"\fR
6 1 equemene
7 1 equemene
\fB\&void\fR
8 1 equemene
\fB\&HPL_pdpanllT(\fR
9 1 equemene
\fB\&HPL_T_panel *\fR
10 1 equemene
\fI\&PANEL\fR,
11 1 equemene
\fB\&const int\fR
12 1 equemene
\fI\&M\fR,
13 1 equemene
\fB\&const int\fR
14 1 equemene
\fI\&N\fR,
15 1 equemene
\fB\&const int\fR
16 1 equemene
\fI\&ICOFF\fR,
17 1 equemene
\fB\&double *\fR
18 1 equemene
\fI\&WORK\fR
19 1 equemene
\fB\&);\fR
20 1 equemene
.SH DESCRIPTION
21 1 equemene
\fB\&HPL_pdpanllT\fR
22 1 equemene
factorizes  a panel of columns that is a sub-array of a
23 1 equemene
larger one-dimensional panel A  using the Left-looking variant of the
24 1 equemene
usual one-dimensional algorithm.  The lower triangular N0-by-N0 upper
25 1 equemene
block of the panel is stored in transpose form.
26 1 equemene
27 1 equemene
Bi-directional  exchange  is  used  to  perform  the  swap::broadcast
28 1 equemene
operations  at once  for one column in the panel.  This  results in a
29 1 equemene
lower number of slightly larger  messages than usual.  On P processes
30 1 equemene
and assuming bi-directional links,  the running time of this function
31 1 equemene
can be approximated by (when N is equal to N0):
32 1 equemene
33 1 equemene
   N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
34 1 equemene
   N0^2 * ( M - N0/3 ) * gam2-3
35 1 equemene
36 1 equemene
where M is the local number of rows of  the panel, lat and bdwth  are
37 1 equemene
the latency and bandwidth of the network for  double  precision  real
38 1 equemene
words, and   gam2-3  is an estimate of the  Level 2 and Level 3  BLAS
39 1 equemene
rate of execution. The  recursive  algorithm  allows indeed to almost
40 1 equemene
achieve  Level 3 BLAS  performance  in the panel factorization.  On a
41 1 equemene
large  number of modern machines,  this  operation is however latency
42 1 equemene
bound,  meaning  that its cost can  be estimated  by only the latency
43 1 equemene
portion N0 * log_2(P) * lat.  Mono-directional links will double this
44 1 equemene
communication cost.
45 1 equemene
46 1 equemene
Note that  one  iteration of the the main loop is unrolled. The local
47 1 equemene
computation of the absolute value max of the next column is performed
48 1 equemene
just after its update by the current column. This allows to bring the
49 1 equemene
current column only  once through  cache at each  step.  The  current
50 1 equemene
implementation  does not perform  any blocking  for  this sequence of
51 1 equemene
BLAS operations, however the design allows for plugging in an optimal
52 1 equemene
(machine-specific) specialized  BLAS-like kernel.  This idea has been
53 1 equemene
suggested to us by Fred Gustavson, IBM T.J. Watson Research Center.
54 1 equemene
.SH ARGUMENTS
55 1 equemene
.TP 8
56 1 equemene
PANEL   (local input/output)    HPL_T_panel *
57 1 equemene
On entry,  PANEL  points to the data structure containing the
58 1 equemene
panel information.
59 1 equemene
.TP 8
60 1 equemene
M       (local input)           const int
61 1 equemene
On entry,  M specifies the local number of rows of sub(A).
62 1 equemene
.TP 8
63 1 equemene
N       (local input)           const int
64 1 equemene
On entry,  N specifies the local number of columns of sub(A).
65 1 equemene
.TP 8
66 1 equemene
ICOFF   (global input)          const int
67 1 equemene
On entry, ICOFF specifies the row and column offset of sub(A)
68 1 equemene
in A.
69 1 equemene
.TP 8
70 1 equemene
WORK    (local workspace)       double *
71 1 equemene
On entry, WORK  is a workarray of size at least 2*(4+2*N0).
72 1 equemene
.SH SEE ALSO
73 1 equemene
.BR HPL_dlocmax \ (3),
74 1 equemene
.BR HPL_dlocswpN \ (3),
75 1 equemene
.BR HPL_dlocswpT \ (3),
76 1 equemene
.BR HPL_pdmxswp \ (3),
77 1 equemene
.BR HPL_pdpancrN \ (3),
78 1 equemene
.BR HPL_pdpancrT \ (3),
79 1 equemene
.BR HPL_pdpanllN \ (3),
80 1 equemene
.BR HPL_pdpanrlN \ (3),
81 1 equemene
.BR HPL_pdpanrlT \ (3).