root / www / faqs.html
Historique | Voir | Annoter | Télécharger (5,74 ko)
1 | 1 | equemene | <HTML>
|
---|---|---|---|
2 | 1 | equemene | <HEAD>
|
3 | 1 | equemene | <TITLE>HPL Frequently Asked Questions</TITLE> |
4 | 1 | equemene | </HEAD>
|
5 | 1 | equemene | |
6 | 1 | equemene | <BODY
|
7 | 1 | equemene | BGCOLOR = "WHITE" |
8 | 1 | equemene | BACKGROUND = "WHITE" |
9 | 1 | equemene | TEXT = "#000000" |
10 | 1 | equemene | VLINK = "#000099" |
11 | 1 | equemene | ALINK = "#947153" |
12 | 1 | equemene | LINK = "#0000ff"> |
13 | 1 | equemene | |
14 | 1 | equemene | <H2>HPL Frequently Asked Questions</H2> |
15 | 1 | equemene | |
16 | 1 | equemene | <UL>
|
17 | 1 | equemene | <LI><A HREF="faqs.html#pbsize">What problem size N should I run ?</A> |
18 | 1 | equemene | <LI><A HREF="faqs.html#blsize">What block size NB should I use ?</A> |
19 | 1 | equemene | <LI><A HREF="faqs.html#grid">What process grid ratio P x Q should I use ?</A> |
20 | 1 | equemene | <LI><A HREF="faqs.html#1node">What about the one processor case ?</A> |
21 | 1 | equemene | <LI><A HREF="faqs.html#options">Why so many options in HPL.dat ?</A> |
22 | 1 | equemene | <LI><A HREF="faqs.html#outperf">Can HPL be outperformed ?</A> |
23 | 1 | equemene | </UL>
|
24 | 1 | equemene | <HR NOSHADE |
25 | 1 | equemene | |
26 | 1 | equemene | <H3<A ="pbsize">What problem size N should I run ?</A></H3> |
27 | 1 | equemene | |
28 | 1 | equemene | In order to find out the best performance of your system, the |
29 | 1 | equemene | largest problem size fitting in memory is what you should aim for. |
30 | 1 | equemene | The amount of memory used by HPL is essentially the size of the |
31 | 1 | equemene | coefficient matrix. So for example, if you have 4 nodes with 256 Mb |
32 | 1 | equemene | of memory on each, this corresponds to 1 Gb total, i.e., 125 M double |
33 | 1 | equemene | precision (8 bytes) elements. The square root of that number is |
34 | 1 | equemene | 11585. One definitely needs to leave some memory for the OS as well |
35 | 1 | equemene | as for other things, so a problem size of 10000 is likely to fit. As |
36 | 1 | equemene | a rule of thumb, 80 % of the total amount of memory is a good guess. |
37 | 1 | equemene | If the problem size you pick is too large, swapping will occur, and |
38 | 1 | equemene | the performance will drop. If multiple processes are spawn on each |
39 | 1 | equemene | node (say you have 2 processors per node), what counts is the |
40 | 1 | equemene | available amount of memory to each process.<BR><BR> |
41 | 1 | equemene | <HR NOSHADE |
42 | 1 | equemene | |
43 | 1 | equemene | <H3<A ="blsize">What block size NB should I use ?</A></H3> |
44 | 1 | equemene | |
45 | 1 | equemene | HPL uses the block size NB for the data distribution as well as for |
46 | 1 | equemene | the computational granularity. From a data distribution point of |
47 | 1 | equemene | view, the smallest NB, the better the load balance. You definitely |
48 | 1 | equemene | want to stay away from very large values of NB. From a computation |
49 | 1 | equemene | point of view, a too small value of NB may limit the computational |
50 | 1 | equemene | performance by a large factor because almost no data reuse will occur |
51 | 1 | equemene | in the highest level of the memory hierarchy. The number of messages |
52 | 1 | equemene | will also increase. Efficient matrix-multiply routines are often |
53 | 1 | equemene | internally blocked. Small multiples of this blocking factor are |
54 | 1 | equemene | likely to be good block sizes for HPL. The bottom line is that "good" |
55 | 1 | equemene | block sizes are almost always in the [32 .. 256] interval. The best |
56 | 1 | equemene | values depend on the computation / communication performance ratio of |
57 | 1 | equemene | your system. To a much less extent, the problem size matters as well. |
58 | 1 | equemene | Say for example, you emperically found that 44 was a good block size |
59 | 1 | equemene | with respect to performance. 88 or 132 are likely to give slightly |
60 | 1 | equemene | better results for large problem sizes because of a slighlty higher |
61 | 1 | equemene | flop rate.<BR><BR> |
62 | 1 | equemene | <HR NOSHADE |
63 | 1 | equemene | |
64 | 1 | equemene | <H3<A ="grid">What process grid ratio P x Q should I use ?</A></H3> |
65 | 1 | equemene | |
66 | 1 | equemene | This depends on the physical interconnection network you have. |
67 | 1 | equemene | Assuming a mesh or a switch HPL "likes" a 1:k ratio with k in [1..3]. |
68 | 1 | equemene | In other words, P and Q should be approximately equal, with Q |
69 | 1 | equemene | slightly larger than P. Examples: 2 x 2, 2 x 4, 2 x 5, 3 x 4, 4 x 4, |
70 | 1 | equemene | 4 x 6, 5 x 6, 4 x 8 ... If you are running on a simple Ethernet |
71 | 1 | equemene | network, there is only one wire through which all the messages are |
72 | 1 | equemene | exchanged. On such a network, the performance and scalability of HPL |
73 | 1 | equemene | is strongly limited and very flat process grids are likely to be the |
74 | 1 | equemene | best choices: 1 x 4, 1 x 8, 2 x 4 ...<BR><BR> |
75 | 1 | equemene | <HR NOSHADE |
76 | 1 | equemene | |
77 | 1 | equemene | <H3<A ="1node">What about the one processor case ?</A></H3> |
78 | 1 | equemene | |
79 | 1 | equemene | HPL has been designed to perform well for large problem sizes on |
80 | 1 | equemene | hundreds of nodes and more. The software works on one node and for |
81 | 1 | equemene | large problem sizes, one can usually achieve pretty good performance |
82 | 1 | equemene | on a single processor as well. For small problem sizes however, the |
83 | 1 | equemene | overhead due to message-passing, local indexing and so on can be |
84 | 1 | equemene | significant.<BR><BR> |
85 | 1 | equemene | <HR NOSHADE |
86 | 1 | equemene | |
87 | 1 | equemene | <H3<A ="options">Why so many options in HPL.dat ?</A></H3> |
88 | 1 | equemene | |
89 | 1 | equemene | There are quite a few reasons. First off, these options are useful to |
90 | 1 | equemene | determine what matters and what does not on your system. Second, HPL |
91 | 1 | equemene | is often used in the context of early evaluation of new systems. In |
92 | 1 | equemene | such a case, everything is usually not quite working right, and it is |
93 | 1 | equemene | convenient to be able to vary these parameters without recompiling. |
94 | 1 | equemene | Finally, every system has its own peculiarities and one is likely to |
95 | 1 | equemene | be willing to emperically determine the best set of parameters. In |
96 | 1 | equemene | any case, one can always follow the advice provided in the |
97 | 1 | equemene | <A HREF = "tuning.html">tuning section</A> of this document and not |
98 | 1 | equemene | worry about the complexity of the input file.<BR><BR> |
99 | 1 | equemene | <HR NOSHADE |
100 | 1 | equemene | |
101 | 1 | equemene | <H3<A ="outperf">Can HPL be Outperformed ?</A></H3> |
102 | 1 | equemene | |
103 | 1 | equemene | Certainly. There is always room for performance improvements. |
104 | 1 | equemene | Specific knowledge about a particular system is always a source of |
105 | 1 | equemene | performance gains. Even from a generic point of view, better |
106 | 1 | equemene | algorithms or more efficient formulation of the classic ones are |
107 | 1 | equemene | potential winners.<BR><BR> |
108 | 1 | equemene | |
109 | 1 | equemene | <HR NOSHADE |
110 | 1 | equemene | <CENTER |
111 | 1 | equemene | <A = "index.html"> [Home]</A> |
112 | 1 | equemene | <A HREF = "copyright.html"> [Copyright and Licensing Terms]</A> |
113 | 1 | equemene | <A HREF = "algorithm.html"> [Algorithm]</A> |
114 | 1 | equemene | <A HREF = "scalability.html"> [Scalability]</A> |
115 | 1 | equemene | <A HREF = "results.html"> [Performance Results]</A> |
116 | 1 | equemene | <A HREF = "documentation.html"> [Documentation]</A> |
117 | 1 | equemene | <A HREF = "software.html"> [Software]</A> |
118 | 1 | equemene | <A HREF = "faqs.html"> [FAQs]</A> |
119 | 1 | equemene | <A HREF = "tuning.html"> [Tuning]</A> |
120 | 1 | equemene | <A HREF = "errata.html"> [Errata-Bugs]</A> |
121 | 1 | equemene | <A HREF = "references.html"> [References]</A> |
122 | 1 | equemene | <A HREF = "links.html"> [Related Links]</A><BR> |
123 | 1 | equemene | </CENTER>
|
124 | 1 | equemene | <HR NOSHADE |
125 | 1 | equemene | </BODY |
126 | 1 | equemene | </HTML |