/www/faqs.html - Annoter - HPL sur GPU - Forge du Centre Blaise Pascal

1

equemene

<HTML>

2

1

equemene

<HEAD>

3

1

equemene

<TITLE>HPL Frequently Asked Questions</TITLE>

4

1

equemene

</HEAD>

5

1

equemene

6

1

equemene

<BODY

7

1

equemene

BGCOLOR     = "WHITE"

8

1

equemene

BACKGROUND  = "WHITE"

9

1

equemene

TEXT        = "#000000"

10

1

equemene

VLINK       = "#000099"

11

1

equemene

ALINK       = "#947153"

12

1

equemene

LINK        = "#0000ff">

13

1

equemene

14

1

equemene

<H2>HPL Frequently Asked Questions</H2>

15

1

equemene

16

1

equemene

<UL>

17

1

equemene

<LI><A HREF="faqs.html#pbsize">What problem size N should I run ?</A>

18

1

equemene

<LI><A HREF="faqs.html#blsize">What block size NB should I use ?</A>

19

1

equemene

<LI><A HREF="faqs.html#grid">What process grid ratio P x Q should I use ?</A>

20

1

equemene

<LI><A HREF="faqs.html#1node">What about the one processor case ?</A>

21

1

equemene

<LI><A HREF="faqs.html#options">Why so many options in HPL.dat ?</A>

22

1

equemene

<LI><A HREF="faqs.html#outperf">Can HPL be outperformed ?</A>

23

1

equemene

</UL>

24

1

equemene

<HR NOSHADE

25

1

equemene

26

1

equemene

<H3<A ="pbsize">What problem size N should I run ?</A></H3>

27

1

equemene

28

1

equemene

In order  to find out  the  best performance   of  your  system,  the

29

1

equemene

largest   problem size  fitting in memory is what you should aim for.

30

1

equemene

The  amount  of  memory  used  by  HPL is essentially the size of the

31

1

equemene

coefficient matrix.  So for example, if you have 4 nodes  with 256 Mb

32

1

equemene

of memory on each, this corresponds to 1 Gb total, i.e., 125 M double

33

1

equemene

precision  (8  bytes)  elements. The  square  root  of that number is

34

1

equemene

11585.  One  definitely needs to leave some memory for the OS as well

35

1

equemene

as for other things, so a problem size of 10000 is likely to fit.  As

36

1

equemene

a rule of thumb, 80 % of the  total amount of memory is a good guess.

37

1

equemene

If the problem size you pick is too large,  swapping will occur,  and

38

1

equemene

the performance will drop.  If multiple processes  are spawn  on each

39

1

equemene

node  (say  you have 2 processors  per  node),  what  counts  is  the

40

1

equemene

available amount of memory to each process.<BR><BR>

41

1

equemene

<HR NOSHADE

42

1

equemene

43

1

equemene

<H3<A ="blsize">What block size NB should I use ?</A></H3>

44

1

equemene

45

1

equemene

HPL  uses  the block size NB for the data distribution as well as for

46

1

equemene

the  computational  granularity.  From  a data distribution  point of

47

1

equemene

view,  the smallest NB,  the better the load balance.  You definitely

48

1

equemene

want  to stay away  from very large values of NB.  From a computation

49

1

equemene

point of view,  a too small value of NB  may  limit the computational

50

1

equemene

performance by a large factor because almost no data reuse will occur

51

1

equemene

in the highest level of the memory hierarchy. The  number of messages

52

1

equemene

will  also  increase.  Efficient  matrix-multiply  routines are often

53

1

equemene

internally  blocked.  Small  multiples  of  this  blocking factor are

54

1

equemene

likely to be good block sizes for HPL. The bottom line is that "good"

55

1

equemene

block sizes are almost always in the [32 .. 256] interval.  The  best

56

1

equemene

values depend on the computation / communication performance ratio of

57

1

equemene

your system. To a much less extent, the problem size matters as well.

58

1

equemene

Say for example,  you emperically found that 44 was a good block size

59

1

equemene

with respect to performance.  88 or 132  are likely  to give slightly

60

1

equemene

better results  for large problem sizes because of a slighlty  higher

61

1

equemene

flop rate.<BR><BR>

62

1

equemene

<HR NOSHADE

63

1

equemene

64

1

equemene

<H3<A ="grid">What process grid ratio P x Q should I use ?</A></H3>

65

1

equemene

66

1

equemene

This  depends  on  the  physical  interconnection  network  you have.

67

1

equemene

Assuming a mesh or a switch HPL "likes" a 1:k ratio with k in [1..3].

68

1

equemene

In  other  words,  P  and  Q  should  be approximately equal,  with Q

69

1

equemene

slightly larger than P. Examples: 2 x 2, 2 x 4, 2 x 5,  3 x 4, 4 x 4,

70

1

equemene

4 x 6, 5 x 6, 4 x 8 ...  If  you  are  running  on  a simple Ethernet

71

1

equemene

network,  there  is  only one wire through which all the messages are

72

1

equemene

exchanged. On  such a network, the performance and scalability of HPL

73

1

equemene

is strongly limited  and very flat process grids are likely to be the

74

1

equemene

best choices: 1 x 4, 1 x 8, 2 x 4 ...<BR><BR>

75

1

equemene

<HR NOSHADE

76

1

equemene

77

1

equemene

<H3<A ="1node">What about the one processor case ?</A></H3>

78

1

equemene

79

1

equemene

HPL  has  been  designed  to  perform well for large problem sizes on

80

1

equemene

hundreds  of  nodes and more.  The software works on one node and for

81

1

equemene

large problem sizes, one  can usually achieve pretty good performance

82

1

equemene

on a single processor as well.  For small problem sizes  however, the

83

1

equemene

overhead  due  to  message-passing,  local  indexing and so on can be

84

1

equemene

significant.<BR><BR>

85

1

equemene

<HR NOSHADE

86

1

equemene

87

1

equemene

<H3<A ="options">Why so many options in HPL.dat ?</A></H3>

88

1

equemene

89

1

equemene

There are quite a few reasons. First off, these options are useful to

90

1

equemene

determine what matters and what does not on your system. Second,  HPL

91

1

equemene

is often used in the context  of early evaluation of new systems.  In

92

1

equemene

such a case, everything is usually not quite working right, and it is

93

1

equemene

convenient  to be able  to vary these parameters without recompiling.

94

1

equemene

Finally,  every system has its own peculiarities and one is likely to

95

1

equemene

be  willing  to  emperically determine the best set of parameters. In

96

1

equemene

any   case,  one  can  always  follow  the  advice  provided  in  the

97

1

equemene

<A HREF = "tuning.html">tuning  section</A> of this  document and not

98

1

equemene

worry about the complexity of the input file.<BR><BR>

99

1

equemene

<HR NOSHADE

100

1

equemene

101

1

equemene

<H3<A ="outperf">Can HPL be Outperformed ?</A></H3>

102

1

equemene

103

1

equemene

Certainly.   There  is  always  room  for  performance  improvements.

104

1

equemene

Specific knowledge about  a  particular system  is always a source of

105

1

equemene

performance   gains.  Even  from  a generic  point  of  view,  better

106

1

equemene

algorithms  or  more  efficient  formulation  of the classic ones are

107

1

equemene

potential winners.<BR><BR>

108

1

equemene

109

1

equemene

<HR NOSHADE

110

1

equemene

<CENTER

111

1

equemene

<A  = "index.html">            [Home]</A>

112

1

equemene

<A HREF = "copyright.html">        [Copyright and Licensing Terms]</A>

113

1

equemene

<A HREF = "algorithm.html">        [Algorithm]</A>

114

1

equemene

<A HREF = "scalability.html">      [Scalability]</A>

115

1

equemene

<A HREF = "results.html">          [Performance Results]</A>

116

1

equemene

<A HREF = "documentation.html">    [Documentation]</A>

117

1

equemene

<A HREF = "software.html">         [Software]</A>

118

1

equemene

<A HREF = "faqs.html">             [FAQs]</A>

119

1

equemene

<A HREF = "tuning.html">           [Tuning]</A>

120

1

equemene

<A HREF = "errata.html">           [Errata-Bugs]</A>

121

1

equemene

<A HREF = "references.html">       [References]</A>

122

1

equemene

<A HREF = "links.html">            [Related Links]</A><BR>

123

1

equemene

</CENTER>

124

1

equemene

<HR NOSHADE

125

1

equemene

</BODY

126

1

equemene

</HTML

Centre Blaise Pascal » HPL sur GPU

root / www / faqs.html

-equemene
+<HTML>
-equemene
+<HEAD>
-equemene
+<TITLE>HPL Frequently Asked Questions</TITLE>
-equemene
+</HEAD>
 equemene
-equemene
+<BODY
-equemene
+BGCOLOR     = "WHITE"
-equemene
+BACKGROUND  = "WHITE"
-equemene
+TEXT        = "#000000"
-equemene
+VLINK       = "#000099"
-equemene
+ALINK       = "#947153"
-equemene
+LINK        = "#0000ff">
 equemene
-equemene
+<H2>HPL Frequently Asked Questions</H2>
 equemene
-equemene
+<UL>
-equemene
+<LI><A HREF="faqs.html#pbsize">What problem size N should I run ?</A>
-equemene
+<LI><A HREF="faqs.html#blsize">What block size NB should I use ?</A>
-equemene
+<LI><A HREF="faqs.html#grid">What process grid ratio P x Q should I use ?</A>
-equemene
+<LI><A HREF="faqs.html#1node">What about the one processor case ?</A>
-equemene
+<LI><A HREF="faqs.html#options">Why so many options in HPL.dat ?</A>
-equemene
+<LI><A HREF="faqs.html#outperf">Can HPL be outperformed ?</A>
-equemene
+</UL>
-equemene
+<HR NOSHADE
 equemene
-equemene
+<H3<A ="pbsize">What problem size N should I run ?</A></H3>
 equemene
-equemene
+In order  to find out  the  best performance   of  your  system,  the
-equemene
+largest   problem size  fitting in memory is what you should aim for.
-equemene
+The  amount  of  memory  used  by  HPL is essentially the size of the
-equemene
+coefficient matrix.  So for example, if you have 4 nodes  with 256 Mb
-equemene
+of memory on each, this corresponds to 1 Gb total, i.e., 125 M double
-equemene
+precision  (8  bytes)  elements. The  square  root  of that number is
-equemene
+.  One  definitely needs to leave some memory for the OS as well
-equemene
+as for other things, so a problem size of 10000 is likely to fit.  As
-equemene
+a rule of thumb, 80 % of the  total amount of memory is a good guess.
-equemene
+If the problem size you pick is too large,  swapping will occur,  and
-equemene
+the performance will drop.  If multiple processes  are spawn  on each
-equemene
+node  (say  you have 2 processors  per  node),  what  counts  is  the
-equemene
+available amount of memory to each process.<BR><BR>
-equemene
+<HR NOSHADE
 equemene
-equemene
+<H3<A ="blsize">What block size NB should I use ?</A></H3>
 equemene
-equemene
+HPL  uses  the block size NB for the data distribution as well as for
-equemene
+the  computational  granularity.  From  a data distribution  point of
-equemene
+view,  the smallest NB,  the better the load balance.  You definitely
-equemene
+want  to stay away  from very large values of NB.  From a computation
-equemene
+point of view,  a too small value of NB  may  limit the computational
-equemene
+performance by a large factor because almost no data reuse will occur
-equemene
+in the highest level of the memory hierarchy. The  number of messages
-equemene
+will  also  increase.  Efficient  matrix-multiply  routines are often
-equemene
+internally  blocked.  Small  multiples  of  this  blocking factor are
-equemene
+likely to be good block sizes for HPL. The bottom line is that "good"
-equemene
+block sizes are almost always in the [32 .. 256] interval.  The  best
-equemene
+values depend on the computation / communication performance ratio of
-equemene
+your system. To a much less extent, the problem size matters as well.
-equemene
+Say for example,  you emperically found that 44 was a good block size
-equemene
+with respect to performance.  88 or 132  are likely  to give slightly
-equemene
+better results  for large problem sizes because of a slighlty  higher
-equemene
+flop rate.<BR><BR>
-equemene
+<HR NOSHADE
 equemene
-equemene
+<H3<A ="grid">What process grid ratio P x Q should I use ?</A></H3>
 equemene
-equemene
+This  depends  on  the  physical  interconnection  network  you have.
-equemene
+Assuming a mesh or a switch HPL "likes" a 1:k ratio with k in [1..3].
-equemene
+In  other  words,  P  and  Q  should  be approximately equal,  with Q
-equemene
+slightly larger than P. Examples: 2 x 2, 2 x 4, 2 x 5,  3 x 4, 4 x 4,
-equemene
+x 6, 5 x 6, 4 x 8 ...  If  you  are  running  on  a simple Ethernet
-equemene
+network,  there  is  only one wire through which all the messages are
-equemene
+exchanged. On  such a network, the performance and scalability of HPL
-equemene
+is strongly limited  and very flat process grids are likely to be the
-equemene
+best choices: 1 x 4, 1 x 8, 2 x 4 ...<BR><BR>
-equemene
+<HR NOSHADE
 equemene
-equemene
+<H3<A ="1node">What about the one processor case ?</A></H3>
 equemene
-equemene
+HPL  has  been  designed  to  perform well for large problem sizes on
-equemene
+hundreds  of  nodes and more.  The software works on one node and for
-equemene
+large problem sizes, one  can usually achieve pretty good performance
-equemene
+on a single processor as well.  For small problem sizes  however, the
-equemene
+overhead  due  to  message-passing,  local  indexing and so on can be
-equemene
+significant.<BR><BR>
-equemene
+<HR NOSHADE
 equemene
-equemene
+<H3<A ="options">Why so many options in HPL.dat ?</A></H3>
 equemene
-equemene
+There are quite a few reasons. First off, these options are useful to
-equemene
+determine what matters and what does not on your system. Second,  HPL
-equemene
+is often used in the context  of early evaluation of new systems.  In
-equemene
+such a case, everything is usually not quite working right, and it is
-equemene
+convenient  to be able  to vary these parameters without recompiling.
-equemene
+Finally,  every system has its own peculiarities and one is likely to
-equemene
+be  willing  to  emperically determine the best set of parameters. In
-equemene
+any   case,  one  can  always  follow  the  advice  provided  in  the
-equemene
+<A HREF = "tuning.html">tuning  section</A> of this  document and not
-equemene
+worry about the complexity of the input file.<BR><BR>
-equemene
+<HR NOSHADE
 equemene
-equemene
+<H3<A ="outperf">Can HPL be Outperformed ?</A></H3>
 equemene
-equemene
+Certainly.   There  is  always  room  for  performance  improvements.
-equemene
+Specific knowledge about  a  particular system  is always a source of
-equemene
+performance   gains.  Even  from  a generic  point  of  view,  better
-equemene
+algorithms  or  more  efficient  formulation  of the classic ones are
-equemene
+potential winners.<BR><BR>
 equemene
-equemene
+<HR NOSHADE
-equemene
+<CENTER
-equemene
+<A  = "index.html">            [Home]</A>
-equemene
+<A HREF = "copyright.html">        [Copyright and Licensing Terms]</A>
-equemene
+<A HREF = "algorithm.html">        [Algorithm]</A>
-equemene
+<A HREF = "scalability.html">      [Scalability]</A>
-equemene
+<A HREF = "results.html">          [Performance Results]</A>
-equemene
+<A HREF = "documentation.html">    [Documentation]</A>
-equemene
+<A HREF = "software.html">         [Software]</A>
-equemene
+<A HREF = "faqs.html">             [FAQs]</A>
-equemene
+<A HREF = "tuning.html">           [Tuning]</A>
-equemene
+<A HREF = "errata.html">           [Errata-Bugs]</A>
-equemene
+<A HREF = "references.html">       [References]</A>
-equemene
+<A HREF = "links.html">            [Related Links]</A><BR>
-equemene
+</CENTER>
-equemene
+<HR NOSHADE
-equemene
+</BODY
-equemene
+</HTML