Statistiques
| Révision :

root / www / faqs.html

Historique | Voir | Annoter | Télécharger (5,74 ko)

1 1 equemene
<HTML>
2 1 equemene
<HEAD>
3 1 equemene
<TITLE>HPL Frequently Asked Questions</TITLE>
4 1 equemene
</HEAD>
5 1 equemene
6 1 equemene
<BODY
7 1 equemene
BGCOLOR     = "WHITE"
8 1 equemene
BACKGROUND  = "WHITE"
9 1 equemene
TEXT        = "#000000"
10 1 equemene
VLINK       = "#000099"
11 1 equemene
ALINK       = "#947153"
12 1 equemene
LINK        = "#0000ff">
13 1 equemene
14 1 equemene
<H2>HPL Frequently Asked Questions</H2>
15 1 equemene
16 1 equemene
<UL>
17 1 equemene
<LI><A HREF="faqs.html#pbsize">What problem size N should I run ?</A>
18 1 equemene
<LI><A HREF="faqs.html#blsize">What block size NB should I use ?</A>
19 1 equemene
<LI><A HREF="faqs.html#grid">What process grid ratio P x Q should I use ?</A>
20 1 equemene
<LI><A HREF="faqs.html#1node">What about the one processor case ?</A>
21 1 equemene
<LI><A HREF="faqs.html#options">Why so many options in HPL.dat ?</A>
22 1 equemene
<LI><A HREF="faqs.html#outperf">Can HPL be outperformed ?</A>
23 1 equemene
</UL>
24 1 equemene
<HR NOSHADE
25 1 equemene
26 1 equemene
<H3<A ="pbsize">What problem size N should I run ?</A></H3>
27 1 equemene
28 1 equemene
In order  to find out  the  best performance   of  your  system,  the
29 1 equemene
largest   problem size  fitting in memory is what you should aim for.
30 1 equemene
The  amount  of  memory  used  by  HPL is essentially the size of the
31 1 equemene
coefficient matrix.  So for example, if you have 4 nodes  with 256 Mb
32 1 equemene
of memory on each, this corresponds to 1 Gb total, i.e., 125 M double
33 1 equemene
precision  (8  bytes)  elements. The  square  root  of that number is
34 1 equemene
11585.  One  definitely needs to leave some memory for the OS as well
35 1 equemene
as for other things, so a problem size of 10000 is likely to fit.  As
36 1 equemene
a rule of thumb, 80 % of the  total amount of memory is a good guess.
37 1 equemene
If the problem size you pick is too large,  swapping will occur,  and
38 1 equemene
the performance will drop.  If multiple processes  are spawn  on each
39 1 equemene
node  (say  you have 2 processors  per  node),  what  counts  is  the
40 1 equemene
available amount of memory to each process.<BR><BR>
41 1 equemene
<HR NOSHADE
42 1 equemene
43 1 equemene
<H3<A ="blsize">What block size NB should I use ?</A></H3>
44 1 equemene
45 1 equemene
HPL  uses  the block size NB for the data distribution as well as for
46 1 equemene
the  computational  granularity.  From  a data distribution  point of
47 1 equemene
view,  the smallest NB,  the better the load balance.  You definitely
48 1 equemene
want  to stay away  from very large values of NB.  From a computation
49 1 equemene
point of view,  a too small value of NB  may  limit the computational
50 1 equemene
performance by a large factor because almost no data reuse will occur
51 1 equemene
in the highest level of the memory hierarchy. The  number of messages
52 1 equemene
will  also  increase.  Efficient  matrix-multiply  routines are often
53 1 equemene
internally  blocked.  Small  multiples  of  this  blocking factor are
54 1 equemene
likely to be good block sizes for HPL. The bottom line is that "good"
55 1 equemene
block sizes are almost always in the [32 .. 256] interval.  The  best
56 1 equemene
values depend on the computation / communication performance ratio of
57 1 equemene
your system. To a much less extent, the problem size matters as well.
58 1 equemene
Say for example,  you emperically found that 44 was a good block size
59 1 equemene
with respect to performance.  88 or 132  are likely  to give slightly
60 1 equemene
better results  for large problem sizes because of a slighlty  higher
61 1 equemene
flop rate.<BR><BR>
62 1 equemene
<HR NOSHADE
63 1 equemene
64 1 equemene
<H3<A ="grid">What process grid ratio P x Q should I use ?</A></H3>
65 1 equemene
66 1 equemene
This  depends  on  the  physical  interconnection  network  you have.
67 1 equemene
Assuming a mesh or a switch HPL "likes" a 1:k ratio with k in [1..3].
68 1 equemene
In  other  words,  P  and  Q  should  be approximately equal,  with Q
69 1 equemene
slightly larger than P. Examples: 2 x 2, 2 x 4, 2 x 5,  3 x 4, 4 x 4,
70 1 equemene
4 x 6, 5 x 6, 4 x 8 ...  If  you  are  running  on  a simple Ethernet
71 1 equemene
network,  there  is  only one wire through which all the messages are
72 1 equemene
exchanged. On  such a network, the performance and scalability of HPL
73 1 equemene
is strongly limited  and very flat process grids are likely to be the
74 1 equemene
best choices: 1 x 4, 1 x 8, 2 x 4 ...<BR><BR>
75 1 equemene
<HR NOSHADE
76 1 equemene
77 1 equemene
<H3<A ="1node">What about the one processor case ?</A></H3>
78 1 equemene
79 1 equemene
HPL  has  been  designed  to  perform well for large problem sizes on
80 1 equemene
hundreds  of  nodes and more.  The software works on one node and for
81 1 equemene
large problem sizes, one  can usually achieve pretty good performance
82 1 equemene
on a single processor as well.  For small problem sizes  however, the
83 1 equemene
overhead  due  to  message-passing,  local  indexing and so on can be
84 1 equemene
significant.<BR><BR>
85 1 equemene
<HR NOSHADE
86 1 equemene
87 1 equemene
<H3<A ="options">Why so many options in HPL.dat ?</A></H3>
88 1 equemene
89 1 equemene
There are quite a few reasons. First off, these options are useful to
90 1 equemene
determine what matters and what does not on your system. Second,  HPL
91 1 equemene
is often used in the context  of early evaluation of new systems.  In
92 1 equemene
such a case, everything is usually not quite working right, and it is
93 1 equemene
convenient  to be able  to vary these parameters without recompiling.
94 1 equemene
Finally,  every system has its own peculiarities and one is likely to
95 1 equemene
be  willing  to  emperically determine the best set of parameters. In
96 1 equemene
any   case,  one  can  always  follow  the  advice  provided  in  the
97 1 equemene
<A HREF = "tuning.html">tuning  section</A> of this  document and not
98 1 equemene
worry about the complexity of the input file.<BR><BR>
99 1 equemene
<HR NOSHADE
100 1 equemene
101 1 equemene
<H3<A ="outperf">Can HPL be Outperformed ?</A></H3>
102 1 equemene
103 1 equemene
Certainly.   There  is  always  room  for  performance  improvements.
104 1 equemene
Specific knowledge about  a  particular system  is always a source of
105 1 equemene
performance   gains.  Even  from  a generic  point  of  view,  better
106 1 equemene
algorithms  or  more  efficient  formulation  of the classic ones are
107 1 equemene
potential winners.<BR><BR>
108 1 equemene
109 1 equemene
<HR NOSHADE
110 1 equemene
<CENTER
111 1 equemene
<A  = "index.html">            [Home]</A>
112 1 equemene
<A HREF = "copyright.html">        [Copyright and Licensing Terms]</A>
113 1 equemene
<A HREF = "algorithm.html">        [Algorithm]</A>
114 1 equemene
<A HREF = "scalability.html">      [Scalability]</A>
115 1 equemene
<A HREF = "results.html">          [Performance Results]</A>
116 1 equemene
<A HREF = "documentation.html">    [Documentation]</A>
117 1 equemene
<A HREF = "software.html">         [Software]</A>
118 1 equemene
<A HREF = "faqs.html">             [FAQs]</A>
119 1 equemene
<A HREF = "tuning.html">           [Tuning]</A>
120 1 equemene
<A HREF = "errata.html">           [Errata-Bugs]</A>
121 1 equemene
<A HREF = "references.html">       [References]</A>
122 1 equemene
<A HREF = "links.html">            [Related Links]</A><BR>
123 1 equemene
</CENTER>
124 1 equemene
<HR NOSHADE
125 1 equemene
</BODY
126 1 equemene
</HTML