root / www / HPL_pdfact.html @ 8
Historique | Voir | Annoter | Télécharger (3,04 ko)
1 |
<HTML>
|
---|---|
2 |
<HEAD>
|
3 |
<TITLE>HPL_pdfact HPL 2.0 Library Functions September 10, 2008</TITLE> |
4 |
</HEAD>
|
5 |
|
6 |
<BODY BGCOLOR="WHITE" TEXT = "#000000" LINK = "#0000ff" VLINK = "#000099" |
7 |
ALINK = "#ffff00"> |
8 |
|
9 |
<H1>Name</H1> |
10 |
<B>HPL_pdfact</B> recursive panel factorization. |
11 |
|
12 |
<H1>Synopsis</H1> |
13 |
<CODE>#include "hpl.h"</CODE><BR><BR> |
14 |
<CODE>void</CODE> |
15 |
<CODE>HPL_pdfact(</CODE> |
16 |
<CODE>HPL_T_panel *</CODE> |
17 |
<CODE>PANEL</CODE> |
18 |
<CODE>);</CODE> |
19 |
|
20 |
<H1>Description</H1> |
21 |
<B>HPL_pdfact</B> |
22 |
recursively factorizes a 1-dimensional panel of columns. |
23 |
The RPFACT function pointer specifies the recursive algorithm to be |
24 |
used, either Crout, Left- or Right looking. NBMIN allows to vary the |
25 |
recursive stopping criterium in terms of the number of columns in the |
26 |
panel, and NDIV allow to specify the number of subpanels each panel |
27 |
should be divided into. Usuallly a value of 2 will be chosen. Finally |
28 |
PFACT is a function pointer specifying the non-recursive algorithm to |
29 |
to be used on at most NBMIN columns. One can also choose here between |
30 |
Crout, Left- or Right looking. Empirical tests seem to indicate that |
31 |
values of 4 or 8 for NBMIN give the best results. |
32 |
|
33 |
Bi-directional exchange is used to perform the swap::broadcast |
34 |
operations at once for one column in the panel. This results in a |
35 |
lower number of slightly larger messages than usual. On P processes |
36 |
and assuming bi-directional links, the running time of this function |
37 |
can be approximated by (when N is equal to N0): |
38 |
|
39 |
N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + |
40 |
N0^2 * ( M - N0/3 ) * gam2-3 |
41 |
|
42 |
where M is the local number of rows of the panel, lat and bdwth are |
43 |
the latency and bandwidth of the network for double precision real |
44 |
words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS |
45 |
rate of execution. The recursive algorithm allows indeed to almost |
46 |
achieve Level 3 BLAS performance in the panel factorization. On a |
47 |
large number of modern machines, this operation is however latency |
48 |
bound, meaning that its cost can be estimated by only the latency |
49 |
portion N0 * log_2(P) * lat. Mono-directional links will double this |
50 |
communication cost. |
51 |
|
52 |
<H1>Arguments</H1> |
53 |
<PRE>
|
54 |
PANEL (local input/output) HPL_T_panel * |
55 |
On entry, PANEL points to the data structure containing the |
56 |
panel information. |
57 |
</PRE>
|
58 |
|
59 |
<H1>See Also</H1> |
60 |
<A HREF="HPL_dlocmax.html">HPL_dlocmax</A>, |
61 |
<A HREF="HPL_dlocswpN.html">HPL_dlocswpN</A>, |
62 |
<A HREF="HPL_dlocswpT.html">HPL_dlocswpT</A>, |
63 |
<A HREF="HPL_pdmxswp.html">HPL_pdmxswp</A>, |
64 |
<A HREF="HPL_pdpancrN.html">HPL_pdpancrN</A>, |
65 |
<A HREF="HPL_pdpancrT.html">HPL_pdpancrT</A>, |
66 |
<A HREF="HPL_pdpanllN.html">HPL_pdpanllN</A>, |
67 |
<A HREF="HPL_pdpanllT.html">HPL_pdpanllT</A>, |
68 |
<A HREF="HPL_pdpanrlN.html">HPL_pdpanrlN</A>, |
69 |
<A HREF="HPL_pdpanrlT.html">HPL_pdpanrlT</A>, |
70 |
<A HREF="HPL_pdrpancrN.html">HPL_pdrpancrN</A>, |
71 |
<A HREF="HPL_pdrpancrT.html">HPL_pdrpancrT</A>, |
72 |
<A HREF="HPL_pdrpanllN.html">HPL_pdrpanllN</A>, |
73 |
<A HREF="HPL_pdrpanllT.html">HPL_pdrpanllT</A>, |
74 |
<A HREF="HPL_pdrpanrlN.html">HPL_pdrpanrlN</A>, |
75 |
<A HREF="HPL_pdrpanrlT.html">HPL_pdrpanrlT</A>. |
76 |
|
77 |
</BODY>
|
78 |
</HTML>
|