root / www / HPL_pdrpancrT.html
Historique | Voir | Annoter | Télécharger (3,28 ko)
1 | 1 | equemene | <HTML>
|
---|---|---|---|
2 | 1 | equemene | <HEAD>
|
3 | 1 | equemene | <TITLE>HPL_pdrpancrT HPL 2.0 Library Functions September 10, 2008</TITLE> |
4 | 1 | equemene | </HEAD>
|
5 | 1 | equemene | |
6 | 1 | equemene | <BODY BGCOLOR="WHITE" TEXT = "#000000" LINK = "#0000ff" VLINK = "#000099" |
7 | 1 | equemene | ALINK = "#ffff00"> |
8 | 1 | equemene | |
9 | 1 | equemene | <H1>Name</H1> |
10 | 1 | equemene | <B>HPL_pdrpancrT</B> Crout recursive panel factorization. |
11 | 1 | equemene | |
12 | 1 | equemene | <H1>Synopsis</H1> |
13 | 1 | equemene | <CODE>#include "hpl.h"</CODE><BR><BR> |
14 | 1 | equemene | <CODE>void</CODE> |
15 | 1 | equemene | <CODE>HPL_pdrpancrT(</CODE> |
16 | 1 | equemene | <CODE>HPL_T_panel *</CODE> |
17 | 1 | equemene | <CODE>PANEL</CODE>, |
18 | 1 | equemene | <CODE>const int</CODE> |
19 | 1 | equemene | <CODE>M</CODE>, |
20 | 1 | equemene | <CODE>const int</CODE> |
21 | 1 | equemene | <CODE>N</CODE>, |
22 | 1 | equemene | <CODE>const int</CODE> |
23 | 1 | equemene | <CODE>ICOFF</CODE>, |
24 | 1 | equemene | <CODE>double *</CODE> |
25 | 1 | equemene | <CODE>WORK</CODE> |
26 | 1 | equemene | <CODE>);</CODE> |
27 | 1 | equemene | |
28 | 1 | equemene | <H1>Description</H1> |
29 | 1 | equemene | <B>HPL_pdrpancrT</B> |
30 | 1 | equemene | recursively factorizes a panel of columns using the |
31 | 1 | equemene | recursive Crout variant of the usual one-dimensional algorithm. |
32 | 1 | equemene | The lower triangular N0-by-N0 upper block of the panel is stored in |
33 | 1 | equemene | transpose form. |
34 | 1 | equemene | |
35 | 1 | equemene | Bi-directional exchange is used to perform the swap::broadcast |
36 | 1 | equemene | operations at once for one column in the panel. This results in a |
37 | 1 | equemene | lower number of slightly larger messages than usual. On P processes |
38 | 1 | equemene | and assuming bi-directional links, the running time of this function |
39 | 1 | equemene | can be approximated by (when N is equal to N0): |
40 | 1 | equemene | |
41 | 1 | equemene | N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + |
42 | 1 | equemene | N0^2 * ( M - N0/3 ) * gam2-3 |
43 | 1 | equemene | |
44 | 1 | equemene | where M is the local number of rows of the panel, lat and bdwth are |
45 | 1 | equemene | the latency and bandwidth of the network for double precision real |
46 | 1 | equemene | words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS |
47 | 1 | equemene | rate of execution. The recursive algorithm allows indeed to almost |
48 | 1 | equemene | achieve Level 3 BLAS performance in the panel factorization. On a |
49 | 1 | equemene | large number of modern machines, this operation is however latency |
50 | 1 | equemene | bound, meaning that its cost can be estimated by only the latency |
51 | 1 | equemene | portion N0 * log_2(P) * lat. Mono-directional links will double this |
52 | 1 | equemene | communication cost. |
53 | 1 | equemene | |
54 | 1 | equemene | <H1>Arguments</H1> |
55 | 1 | equemene | <PRE>
|
56 | 1 | equemene | PANEL (local input/output) HPL_T_panel * |
57 | 1 | equemene | On entry, PANEL points to the data structure containing the |
58 | 1 | equemene | panel information. |
59 | 1 | equemene | </PRE>
|
60 | 1 | equemene | <PRE>
|
61 | 1 | equemene | M (local input) const int |
62 | 1 | equemene | On entry, M specifies the local number of rows of sub(A). |
63 | 1 | equemene | </PRE>
|
64 | 1 | equemene | <PRE>
|
65 | 1 | equemene | N (local input) const int |
66 | 1 | equemene | On entry, N specifies the local number of columns of sub(A). |
67 | 1 | equemene | </PRE>
|
68 | 1 | equemene | <PRE>
|
69 | 1 | equemene | ICOFF (global input) const int |
70 | 1 | equemene | On entry, ICOFF specifies the row and column offset of sub(A) |
71 | 1 | equemene | in A. |
72 | 1 | equemene | </PRE>
|
73 | 1 | equemene | <PRE>
|
74 | 1 | equemene | WORK (local workspace) double * |
75 | 1 | equemene | On entry, WORK is a workarray of size at least 2*(4+2*N0). |
76 | 1 | equemene | </PRE>
|
77 | 1 | equemene | |
78 | 1 | equemene | <H1>See Also</H1> |
79 | 1 | equemene | <A HREF="HPL_dlocmax.html">HPL_dlocmax</A>, |
80 | 1 | equemene | <A HREF="HPL_dlocswpN.html">HPL_dlocswpN</A>, |
81 | 1 | equemene | <A HREF="HPL_dlocswpT.html">HPL_dlocswpT</A>, |
82 | 1 | equemene | <A HREF="HPL_pdmxswp.html">HPL_pdmxswp</A>, |
83 | 1 | equemene | <A HREF="HPL_pdpancrN.html">HPL_pdpancrN</A>, |
84 | 1 | equemene | <A HREF="HPL_pdpancrT.html">HPL_pdpancrT</A>, |
85 | 1 | equemene | <A HREF="HPL_pdpanllN.html">HPL_pdpanllN</A>, |
86 | 1 | equemene | <A HREF="HPL_pdpanllT.html">HPL_pdpanllT</A>, |
87 | 1 | equemene | <A HREF="HPL_pdpanrlN.html">HPL_pdpanrlN</A>, |
88 | 1 | equemene | <A HREF="HPL_pdpanrlT.html">HPL_pdpanrlT</A>, |
89 | 1 | equemene | <A HREF="HPL_pdrpancrN.html">HPL_pdrpancrN</A>, |
90 | 1 | equemene | <A HREF="HPL_pdrpanllN.html">HPL_pdrpanllN</A>, |
91 | 1 | equemene | <A HREF="HPL_pdrpanllT.html">HPL_pdrpanllT</A>, |
92 | 1 | equemene | <A HREF="HPL_pdrpanrlN.html">HPL_pdrpanrlN</A>, |
93 | 1 | equemene | <A HREF="HPL_pdrpanrlT.html">HPL_pdrpanrlT</A>, |
94 | 1 | equemene | <A HREF="HPL_pdfact.html">HPL_pdfact</A>. |
95 | 1 | equemene | |
96 | 1 | equemene | </BODY>
|
97 | 1 | equemene | </HTML> |