Statistiques
| Révision :

root / www / HPL_pdfact.html @ 7

Historique | Voir | Annoter | Télécharger (3,04 ko)

1
<HTML>
2
<HEAD>
3
<TITLE>HPL_pdfact HPL 2.0 Library Functions September 10, 2008</TITLE> 
4
</HEAD>
5

    
6
<BODY BGCOLOR="WHITE" TEXT = "#000000" LINK = "#0000ff" VLINK = "#000099"
7
      ALINK = "#ffff00">
8

    
9
<H1>Name</H1>
10
<B>HPL_pdfact</B> recursive panel factorization.
11

    
12
<H1>Synopsis</H1>
13
<CODE>#include "hpl.h"</CODE><BR><BR>
14
<CODE>void</CODE>
15
<CODE>HPL_pdfact(</CODE>
16
<CODE>HPL_T_panel *</CODE>
17
<CODE>PANEL</CODE>
18
<CODE>);</CODE>
19

    
20
<H1>Description</H1>
21
<B>HPL_pdfact</B>
22
recursively factorizes a  1-dimensional  panel of columns.
23
The  RPFACT  function pointer specifies the recursive algorithm to be
24
used, either Crout, Left- or Right looking.  NBMIN allows to vary the
25
recursive stopping criterium in terms of the number of columns in the
26
panel, and  NDIV  allow to specify the number of subpanels each panel
27
should be divided into. Usuallly a value of 2 will be chosen. Finally
28
PFACT is a function pointer specifying the non-recursive algorithm to
29
to be used on at most NBMIN columns. One can also choose here between
30
Crout, Left- or Right looking.  Empirical tests seem to indicate that
31
values of 4 or 8 for NBMIN give the best results.
32
 
33
Bi-directional  exchange  is  used  to  perform  the  swap::broadcast
34
operations  at once  for one column in the panel.  This  results in a
35
lower number of slightly larger  messages than usual.  On P processes
36
and assuming bi-directional links,  the running time of this function
37
can be approximated by (when N is equal to N0):                      
38
 
39
   N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
40
   N0^2 * ( M - N0/3 ) * gam2-3
41
 
42
where M is the local number of rows of  the panel, lat and bdwth  are
43
the latency and bandwidth of the network for  double  precision  real
44
words, and  gam2-3  is  an estimate of the  Level 2 and Level 3  BLAS
45
rate of execution. The  recursive  algorithm  allows indeed to almost
46
achieve  Level 3 BLAS  performance  in the panel factorization.  On a
47
large  number of modern machines,  this  operation is however latency
48
bound,  meaning  that its cost can  be estimated  by only the latency
49
portion N0 * log_2(P) * lat.  Mono-directional links will double this
50
communication cost.
51

    
52
<H1>Arguments</H1>
53
<PRE>
54
PANEL   (local input/output)          HPL_T_panel *
55
        On entry,  PANEL  points to the data structure containing the
56
        panel information.
57
</PRE>
58

    
59
<H1>See Also</H1>
60
<A HREF="HPL_dlocmax.html">HPL_dlocmax</A>,
61
<A HREF="HPL_dlocswpN.html">HPL_dlocswpN</A>,
62
<A HREF="HPL_dlocswpT.html">HPL_dlocswpT</A>,
63
<A HREF="HPL_pdmxswp.html">HPL_pdmxswp</A>,
64
<A HREF="HPL_pdpancrN.html">HPL_pdpancrN</A>,
65
<A HREF="HPL_pdpancrT.html">HPL_pdpancrT</A>,
66
<A HREF="HPL_pdpanllN.html">HPL_pdpanllN</A>,
67
<A HREF="HPL_pdpanllT.html">HPL_pdpanllT</A>,
68
<A HREF="HPL_pdpanrlN.html">HPL_pdpanrlN</A>,
69
<A HREF="HPL_pdpanrlT.html">HPL_pdpanrlT</A>,
70
<A HREF="HPL_pdrpancrN.html">HPL_pdrpancrN</A>,
71
<A HREF="HPL_pdrpancrT.html">HPL_pdrpancrT</A>,
72
<A HREF="HPL_pdrpanllN.html">HPL_pdrpanllN</A>,
73
<A HREF="HPL_pdrpanllT.html">HPL_pdrpanllT</A>,
74
<A HREF="HPL_pdrpanrlN.html">HPL_pdrpanrlN</A>,
75
<A HREF="HPL_pdrpanrlT.html">HPL_pdrpanrlT</A>.
76

    
77
</BODY>
78
</HTML>