root / www / HPL_pdrpanllN.html
Historique | Voir | Annoter | Télécharger (3,33 ko)
1 |
<HTML>
|
---|---|
2 |
<HEAD>
|
3 |
<TITLE>HPL_pdrpanllN HPL 2.0 Library Functions September 10, 2008</TITLE> |
4 |
</HEAD>
|
5 |
|
6 |
<BODY BGCOLOR="WHITE" TEXT = "#000000" LINK = "#0000ff" VLINK = "#000099" |
7 |
ALINK = "#ffff00"> |
8 |
|
9 |
<H1>Name</H1> |
10 |
<B>HPL_pdrpanllN</B> Left-looking recursive panel factorization. |
11 |
|
12 |
<H1>Synopsis</H1> |
13 |
<CODE>#include "hpl.h"</CODE><BR><BR> |
14 |
<CODE>void</CODE> |
15 |
<CODE>HPL_pdrpanllN(</CODE> |
16 |
<CODE>HPL_T_panel *</CODE> |
17 |
<CODE>PANEL</CODE>, |
18 |
<CODE>const int</CODE> |
19 |
<CODE>M</CODE>, |
20 |
<CODE>const int</CODE> |
21 |
<CODE>N</CODE>, |
22 |
<CODE>const int</CODE> |
23 |
<CODE>ICOFF</CODE>, |
24 |
<CODE>double *</CODE> |
25 |
<CODE>WORK</CODE> |
26 |
<CODE>);</CODE> |
27 |
|
28 |
<H1>Description</H1> |
29 |
<B>HPL_pdrpanllN</B> |
30 |
recursively factorizes a panel of columns using the |
31 |
recursive Left-looking variant of the one-dimensional algorithm. The |
32 |
lower triangular N0-by-N0 upper block of the panel is stored in |
33 |
no-transpose form (i.e. just like the input matrix itself). |
34 |
|
35 |
Bi-directional exchange is used to perform the swap::broadcast |
36 |
operations at once for one column in the panel. This results in a |
37 |
lower number of slightly larger messages than usual. On P processes |
38 |
and assuming bi-directional links, the running time of this function |
39 |
can be approximated by (when N is equal to N0): |
40 |
|
41 |
N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + |
42 |
N0^2 * ( M - N0/3 ) * gam2-3 |
43 |
|
44 |
where M is the local number of rows of the panel, lat and bdwth are |
45 |
the latency and bandwidth of the network for double precision real |
46 |
words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS |
47 |
rate of execution. The recursive algorithm allows indeed to almost |
48 |
achieve Level 3 BLAS performance in the panel factorization. On a |
49 |
large number of modern machines, this operation is however latency |
50 |
bound, meaning that its cost can be estimated by only the latency |
51 |
portion N0 * log_2(P) * lat. Mono-directional links will double this |
52 |
communication cost. |
53 |
|
54 |
<H1>Arguments</H1> |
55 |
<PRE>
|
56 |
PANEL (local input/output) HPL_T_panel * |
57 |
On entry, PANEL points to the data structure containing the |
58 |
panel information. |
59 |
</PRE>
|
60 |
<PRE>
|
61 |
M (local input) const int |
62 |
On entry, M specifies the local number of rows of sub(A). |
63 |
</PRE>
|
64 |
<PRE>
|
65 |
N (local input) const int |
66 |
On entry, N specifies the local number of columns of sub(A). |
67 |
</PRE>
|
68 |
<PRE>
|
69 |
ICOFF (global input) const int |
70 |
On entry, ICOFF specifies the row and column offset of sub(A) |
71 |
in A. |
72 |
</PRE>
|
73 |
<PRE>
|
74 |
WORK (local workspace) double * |
75 |
On entry, WORK is a workarray of size at least 2*(4+2*N0). |
76 |
</PRE>
|
77 |
|
78 |
<H1>See Also</H1> |
79 |
<A HREF="HPL_dlocmax.html">HPL_dlocmax</A>, |
80 |
<A HREF="HPL_dlocswpN.html">HPL_dlocswpN</A>, |
81 |
<A HREF="HPL_dlocswpT.html">HPL_dlocswpT</A>, |
82 |
<A HREF="HPL_pdmxswp.html">HPL_pdmxswp</A>, |
83 |
<A HREF="HPL_pdpancrN.html">HPL_pdpancrN</A>, |
84 |
<A HREF="HPL_pdpancrT.html">HPL_pdpancrT</A>, |
85 |
<A HREF="HPL_pdpanllN.html">HPL_pdpanllN</A>, |
86 |
<A HREF="HPL_pdpanllT.html">HPL_pdpanllT</A>, |
87 |
<A HREF="HPL_pdpanrlN.html">HPL_pdpanrlN</A>, |
88 |
<A HREF="HPL_pdpanrlT.html">HPL_pdpanrlT</A>, |
89 |
<A HREF="HPL_pdrpancrN.html">HPL_pdrpancrN</A>, |
90 |
<A HREF="HPL_pdrpancrT.html">HPL_pdrpancrT</A>, |
91 |
<A HREF="HPL_pdrpanllT.html">HPL_pdrpanllT</A>, |
92 |
<A HREF="HPL_pdrpanrlN.html">HPL_pdrpanrlN</A>, |
93 |
<A HREF="HPL_pdrpanrlT.html">HPL_pdrpanrlT</A>, |
94 |
<A HREF="HPL_pdfact.html">HPL_pdfact</A>. |
95 |
|
96 |
</BODY>
|
97 |
</HTML>
|