Statistiques
| Révision :

root / www / HPL_pdpanrlN.html

Historique | Voir | Annoter | Télécharger (3,56 ko)

1 1 equemene
<HTML>
2 1 equemene
<HEAD>
3 1 equemene
<TITLE>HPL_pdpanrlN HPL 2.0 Library Functions September 10, 2008</TITLE>
4 1 equemene
</HEAD>
5 1 equemene
6 1 equemene
<BODY BGCOLOR="WHITE" TEXT = "#000000" LINK = "#0000ff" VLINK = "#000099"
7 1 equemene
      ALINK = "#ffff00">
8 1 equemene
9 1 equemene
<H1>Name</H1>
10 1 equemene
<B>HPL_pdpanrlN</B> Right-looking panel factorization.
11 1 equemene
12 1 equemene
<H1>Synopsis</H1>
13 1 equemene
<CODE>#include "hpl.h"</CODE><BR><BR>
14 1 equemene
<CODE>void</CODE>
15 1 equemene
<CODE>HPL_pdpanrlN(</CODE>
16 1 equemene
<CODE>HPL_T_panel *</CODE>
17 1 equemene
<CODE>PANEL</CODE>,
18 1 equemene
<CODE>const int</CODE>
19 1 equemene
<CODE>M</CODE>,
20 1 equemene
<CODE>const int</CODE>
21 1 equemene
<CODE>N</CODE>,
22 1 equemene
<CODE>const int</CODE>
23 1 equemene
<CODE>ICOFF</CODE>,
24 1 equemene
<CODE>double *</CODE>
25 1 equemene
<CODE>WORK</CODE>
26 1 equemene
<CODE>);</CODE>
27 1 equemene
28 1 equemene
<H1>Description</H1>
29 1 equemene
<B>HPL_pdpanrlN</B>
30 1 equemene
factorizes  a panel of columns  that is a sub-array of a
31 1 equemene
larger one-dimensional panel A using the Right-looking variant of the
32 1 equemene
usual one-dimensional algorithm.  The lower triangular N0-by-N0 upper
33 1 equemene
block of the panel is stored in no-transpose form (i.e. just like the
34 1 equemene
input matrix itself).
35 1 equemene
36 1 equemene
Bi-directional  exchange  is  used  to  perform  the  swap::broadcast
37 1 equemene
operations  at once  for one column in the panel.  This  results in a
38 1 equemene
lower number of slightly larger  messages than usual.  On P processes
39 1 equemene
and assuming bi-directional links,  the running time of this function
40 1 equemene
can be approximated by (when N is equal to N0):
41 1 equemene
42 1 equemene
   N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
43 1 equemene
   N0^2 * ( M - N0/3 ) * gam2-3
44 1 equemene
45 1 equemene
where M is the local number of rows of  the panel, lat and bdwth  are
46 1 equemene
the latency and bandwidth of the network for  double  precision  real
47 1 equemene
words, and  gam2-3  is  an estimate of the  Level 2 and Level 3  BLAS
48 1 equemene
rate of execution. The  recursive  algorithm  allows indeed to almost
49 1 equemene
achieve  Level 3 BLAS  performance  in the panel factorization.  On a
50 1 equemene
large  number of modern machines,  this  operation is however latency
51 1 equemene
bound,  meaning  that its cost can  be estimated  by only the latency
52 1 equemene
portion N0 * log_2(P) * lat.  Mono-directional links will double this
53 1 equemene
communication cost.
54 1 equemene
55 1 equemene
Note that  one  iteration of the the main loop is unrolled. The local
56 1 equemene
computation of the absolute value max of the next column is performed
57 1 equemene
just after its update by the current column. This allows to bring the
58 1 equemene
current column only  once through  cache at each  step.  The  current
59 1 equemene
implementation  does not perform  any blocking  for  this sequence of
60 1 equemene
BLAS operations, however the design allows for plugging in an optimal
61 1 equemene
(machine-specific) specialized  BLAS-like kernel.  This idea has been
62 1 equemene
suggested to us by Fred Gustavson, IBM T.J. Watson Research Center.
63 1 equemene
64 1 equemene
<H1>Arguments</H1>
65 1 equemene
<PRE>
66 1 equemene
PANEL   (local input/output)          HPL_T_panel *
67 1 equemene
        On entry,  PANEL  points to the data structure containing the
68 1 equemene
        panel information.
69 1 equemene
</PRE>
70 1 equemene
<PRE>
71 1 equemene
M       (local input)                 const int
72 1 equemene
        On entry,  M specifies the local number of rows of sub(A).
73 1 equemene
</PRE>
74 1 equemene
<PRE>
75 1 equemene
N       (local input)                 const int
76 1 equemene
        On entry,  N specifies the local number of columns of sub(A).
77 1 equemene
</PRE>
78 1 equemene
<PRE>
79 1 equemene
ICOFF   (global input)                const int
80 1 equemene
        On entry, ICOFF specifies the row and column offset of sub(A)
81 1 equemene
        in A.
82 1 equemene
</PRE>
83 1 equemene
<PRE>
84 1 equemene
WORK    (local workspace)             double *
85 1 equemene
        On entry, WORK  is a workarray of size at least 2*(4+2*N0).
86 1 equemene
</PRE>
87 1 equemene
88 1 equemene
<H1>See Also</H1>
89 1 equemene
<A HREF="HPL_dlocmax.html">HPL_dlocmax</A>,
90 1 equemene
<A HREF="HPL_dlocswpN.html">HPL_dlocswpN</A>,
91 1 equemene
<A HREF="HPL_dlocswpT.html">HPL_dlocswpT</A>,
92 1 equemene
<A HREF="HPL_pdmxswp.html">HPL_pdmxswp</A>,
93 1 equemene
<A HREF="HPL_pdpancrN.html">HPL_pdpancrN</A>,
94 1 equemene
<A HREF="HPL_pdpancrT.html">HPL_pdpancrT</A>,
95 1 equemene
<A HREF="HPL_pdpanllN.html">HPL_pdpanllN</A>,
96 1 equemene
<A HREF="HPL_pdpanllT.html">HPL_pdpanllT</A>,
97 1 equemene
<A HREF="HPL_pdpanrlT.html">HPL_pdpanrlT</A>.
98 1 equemene
99 1 equemene
</BODY>
100 1 equemene
</HTML>