/www/HPL_plindx0.html - Annoter - HPL sur GPU - Forge du Centre Blaise Pascal

1

equemene

<HTML>

2

1

equemene

<HEAD>

3

1

equemene

<TITLE>HPL_plindx0 HPL 2.0 Library Functions September 10, 2008</TITLE>

4

1

equemene

</HEAD>

5

1

equemene

6

1

equemene

<BODY BGCOLOR="WHITE" TEXT = "#000000" LINK = "#0000ff" VLINK = "#000099"

7

1

equemene

      ALINK = "#ffff00">

8

1

equemene

9

1

equemene

<H1>Name</H1>

10

1

equemene

<B>HPL_plindx0</B> Compute local swapping index arrays.

11

1

equemene

12

1

equemene

<H1>Synopsis</H1>

13

1

equemene

<CODE>#include "hpl.h"</CODE><BR><BR>

14

1

equemene

<CODE>void</CODE>

15

1

equemene

<CODE>HPL_plindx0(</CODE>

16

1

equemene

<CODE>HPL_T_panel *</CODE>

17

1

equemene

<CODE>PANEL</CODE>,

18

1

equemene

<CODE>const int</CODE>

19

1

equemene

<CODE>K</CODE>,

20

1

equemene

<CODE>int *</CODE>

21

1

equemene

<CODE>IPID</CODE>,

22

1

equemene

<CODE>int *</CODE>

23

1

equemene

<CODE>LINDXA</CODE>,

24

1

equemene

<CODE>int *</CODE>

25

1

equemene

<CODE>LINDXAU</CODE>,

26

1

equemene

<CODE>int *</CODE>

27

1

equemene

<CODE>LLEN</CODE>

28

1

equemene

<CODE>);</CODE>

29

1

equemene

30

1

equemene

<H1>Description</H1>

31

1

equemene

<B>HPL_plindx0</B>

32

1

equemene

computes two local arrays  LINDXA and  LINDXAU  containing

33

1

equemene

the  local  source and final destination position  resulting from the

34

1

equemene

application of row interchanges.

35

1

equemene

36

1

equemene

On entry, the array  IPID  of length K is such that the row of global

37

1

equemene

index  IPID(i)  should be mapped onto row of global index  IPID(i+1).

38

1

equemene

Let  IA  be the global index of the first row to be swapped. For k in

39

1

equemene

[0..K/2), the row of global index IPID(2*k) should be mapped onto the

40

1

equemene

row of global index  IPID(2*k+1).  The question then, is to determine

41

1

equemene

which rows should ultimately be part of U.

42

1

equemene

43

1

equemene

First, some rows of the process ICURROW  may be swapped locally.  One

44

1

equemene

of this row belongs to U, the other one belongs to my local  piece of

45

1

equemene

A.  The other  rows of the current block are swapped with remote rows

46

1

equemene

and are thus not part of U. These rows however should be sent  along,

47

1

equemene

and  grabbed by the other processes  as we  progress in the  exchange

48

1

equemene

phase.

49

1

equemene

50

1

equemene

So, assume that I am  ICURROW  and consider a row of index  IPID(2*i)

51

1

equemene

that I own. If I own IPID(2*i+1) as well and IPID(2*i+1) - IA is less

52

1

equemene

than N,  this row is locally swapped and should be copied into  U  at

53

1

equemene

the position IPID(2*i+1) - IA. No row will be exchanged for this one.

54

1

equemene

If IPID(2*i+1)-IA is greater than N, then the row IPID(2*i) should be

55

1

equemene

locally copied into my local piece of A at the position corresponding

56

1

equemene

to the row of global index IPID(2*i+1).

57

1

equemene

58

1

equemene

If the process  ICURROW does not own  IPID(2*i+1), then row IPID(2*i)

59

1

equemene

is to be swapped away and strictly speaking does not belong to U, but

60

1

equemene

to  A  remotely.  Since this  process will however send this array U,

61

1

equemene

this row is  copied into  U, exactly where the row IPID(2*i+1) should

62

1

equemene

go. For this, we search IPID for k1, such that IPID(2*k1) is equal to

63

1

equemene

IPID(2*i+1); and row  IPID(2*i) is to be copied in U  at the position

64

1

equemene

IPID(2*k1+1)-IA.

65

1

equemene

66

1

equemene

It is thus  important to put the rows that go into U, i.e., such that

67

1

equemene

IPID(2*i+1) - IA is less than N at the begining of the array IPID. By

68

1

equemene

doing so,  U  is formed, and the local copy  is performed in just one

69

1

equemene

sweep.

70

1

equemene

71

1

equemene

Two lists  LINDXA  and  LINDXAU are built.  LINDXA contains the local

72

1

equemene

index of the rows I have that should be copied. LINDXAU  contains the

73

1

equemene

local destination information: if LINDXAU(k) >= 0, row LINDXA(k) of A

74

1

equemene

is to be copied in U at position LINDXAU(k). Otherwise, row LINDXA(k)

75

1

equemene

of A should be locally copied into A(-LINDXAU(k),:).  In the  process

76

1

equemene

ICURROW, the initial packing algorithm proceeds as follows.

77

1

equemene

78

1

equemene

  for all entries in IPID,

79

1

equemene

     if IPID(2*i) is in ICURROW,

80

1

equemene

        if IPID(2*i+1) is in ICURROW,

81

1

equemene

           if( IPID(2*i+1) - IA < N )

82

1

equemene

            save corresponding local position

83

1

equemene

            of this row (LINDXA);

84

1

equemene

            save local position (LINDXAU) in U

85

1

equemene

            where this row goes;

86

1

equemene

            [copy row IPID(2*i) in U at position

87

1

equemene

            IPID(2*i+1)-IA; ];

88

1

equemene

           else

89

1

equemene

            save corresponding local position of

90

1

equemene

            this row (LINDXA);

91

1

equemene

            save local position (-LINDXAU) in A

92

1

equemene

            where this row goes;

93

1

equemene

            [copy row IPID(2*i) in my piece of A

94

1

equemene

            at IPID(2*i+1);]

95

1

equemene

           end if

96

1

equemene

        else

97

1

equemene

           find k1 such that IPID(2*k1) = IPID(2*i+1);

98

1

equemene

           copy row IPID(2*i) in U at position

99

1

equemene

           IPID(2*k1+1)-IA;

100

1

equemene

           save corresponding local position of this

101

1

equemene

           row (LINDXA);

102

1

equemene

           save local position (LINDXAU) in U where

103

1

equemene

           this row goes;

104

1

equemene

        end if

105

1

equemene

     end if

106

1

equemene

  end for

107

1

equemene

108

1

equemene

Second, if I am not the current row process  ICURROW, all source rows

109

1

equemene

in IPID that I own are part of U. Indeed,  they  are swapped with one

110

1

equemene

row  of  the  current  block  of rows,  and  the  main  factorization

111

1

equemene

algorithm proceeds one row after each other.  The processes different

112

1

equemene

from ICURROW,  should  exchange and accumulate  those rows until they

113

1

equemene

receive some data previously owned by the process ICURROW.

114

1

equemene

115

1

equemene

In processes different from  ICURROW,  the  initial packing algorithm

116

1

equemene

proceeds as follows.  Consider a row of global index IPID(2*i) that I

117

1

equemene

own. When I will be receiving data previously owned by ICURROW, i.e.,

118

1

equemene

U, row IPID(2*i) should  replace the row in U at pos. IPID(2*i+1)-IA,

119

1

equemene

and  this particular row of U should be first copied into my piece of

120

1

equemene

A, at A(il,:),  where  il is the  local row  index  corresponding  to

121

1

equemene

IPID(2*i). Now,initially, this row will be packed into workspace, say

122

1

equemene

as the kth row of  that  work array.  The  following  algorithm  sets

123

1

equemene

LINDXAU[k] to IPID(2*i+1)-IA, that is the position in U where the row

124

1

equemene

should be copied. LINDXA(k) stores the local index in  A  where  this

125

1

equemene

row of U should be copied, i.e il.

126

1

equemene

127

1

equemene

  for all entries in IPID,

128

1

equemene

     if IPID(2*i) is not in ICURROW,

129

1

equemene

        copy row IPID(2*i) in work array;

130

1

equemene

        save corresponding local position

131

1

equemene

        of this row (LINDXA);

132

1

equemene

        save position (LINDXAU) in U where

133

1

equemene

        this row should be copied;

134

1

equemene

     end if

135

1

equemene

  end for

136

1

equemene

137

1

equemene

Since we are at it, we also globally figure  out  how many rows every

138

1

equemene

process has. That is necessary, because it would rather be cumbersome

139

1

equemene

to  figure it on  the fly  during the  bi-directional exchange phase.

140

1

equemene

This information is kept in the array  LLEN  of size NPROW. Also note

141

1

equemene

that the arrays LINDXA and LINDXAU are of max length equal to 2*N.

142

1

equemene

143

1

equemene

<H1>Arguments</H1>

144

1

equemene

<PRE>

145

1

equemene

PANEL   (local input/output)          HPL_T_panel *

146

1

equemene

        On entry,  PANEL  points to the data structure containing the

147

1

equemene

        panel information.

148

1

equemene

</PRE>

149

1

equemene

<PRE>

150

1

equemene

K       (global input)                const int

151

1

equemene

        On entry, K specifies the number of entries in IPID.  K is at

152

1

equemene

        least 2*N, and at most 4*N.

153

1

equemene

</PRE>

154

1

equemene

<PRE>

155

1

equemene

IPID    (global input)                int *

156

1

equemene

        On entry,  IPID  is an array of length K. The first K entries

157

1

equemene

        of that array contain the src and final destination resulting

158

1

equemene

        from the application of the interchanges.

159

1

equemene

</PRE>

160

1

equemene

<PRE>

161

1

equemene

LINDXA  (local output)                int *

162

1

equemene

        On entry, LINDXA  is an array of dimension 2*N. On exit, this

163

1

equemene

        array contains the local indexes of the rows of A I have that

164

1

equemene

        should be copied into U.

165

1

equemene

</PRE>

166

1

equemene

<PRE>

167

1

equemene

LINDXAU (local output)                int *

168

1

equemene

        On exit, LINDXAU  is an array of dimension 2*N. On exit, this

169

1

equemene

        array contains  the local destination  information encoded as

170

1

equemene

        follows.  If LINDXAU(k) >= 0, row  LINDXA(k)  of A  is  to be

171

1

equemene

        copied in U at position LINDXAU(k).  Otherwise, row LINDXA(k)

172

1

equemene

        of A should be locally copied into A(-LINDXAU(k),:).

173

1

equemene

</PRE>

174

1

equemene

<PRE>

175

1

equemene

LLEN    (global output)               int *

176

1

equemene

        On entry,  LLEN  is  an array  of length  NPROW.  On exit, it

177

1

equemene

        contains how many rows every process has.

178

1

equemene

</PRE>

179

1

equemene

180

1

equemene

<H1>See Also</H1>

181

1

equemene

<A HREF="HPL_pdlaswp00N.html">HPL_pdlaswp00N</A>,

182

1

equemene

<A HREF="HPL_pdlaswp00T.html">HPL_pdlaswp00T</A>,

183

1

equemene

<A HREF="HPL_pdlaswp01N.html">HPL_pdlaswp01N</A>,

184

1

equemene

<A HREF="HPL_pdlaswp01T.html">HPL_pdlaswp01T</A>.

185

1

equemene

186

1

equemene

</BODY>

187

1

equemene

</HTML>

Centre Blaise Pascal » HPL sur GPU

root / www / HPL_plindx0.html

-equemene
+<HTML>
-equemene
+<HEAD>
-equemene
+<TITLE>HPL_plindx0 HPL 2.0 Library Functions September 10, 2008</TITLE>
-equemene
+</HEAD>
 equemene
-equemene
+<BODY BGCOLOR="WHITE" TEXT = "#000000" LINK = "#0000ff" VLINK = "#000099"
-equemene
+      ALINK = "#ffff00">
 equemene
-equemene
+<H1>Name</H1>
-equemene
+<B>HPL_plindx0</B> Compute local swapping index arrays.
 equemene
-equemene
+<H1>Synopsis</H1>
-equemene
+<CODE>#include "hpl.h"</CODE><BR><BR>
-equemene
+<CODE>void</CODE>
-equemene
+<CODE>HPL_plindx0(</CODE>
-equemene
+<CODE>HPL_T_panel *</CODE>
-equemene
+<CODE>PANEL</CODE>,
-equemene
+<CODE>const int</CODE>
-equemene
+<CODE>K</CODE>,
-equemene
+<CODE>int *</CODE>
-equemene
+<CODE>IPID</CODE>,
-equemene
+<CODE>int *</CODE>
-equemene
+<CODE>LINDXA</CODE>,
-equemene
+<CODE>int *</CODE>
-equemene
+<CODE>LINDXAU</CODE>,
-equemene
+<CODE>int *</CODE>
-equemene
+<CODE>LLEN</CODE>
-equemene
+<CODE>);</CODE>
 equemene
-equemene
+<H1>Description</H1>
-equemene
+<B>HPL_plindx0</B>
-equemene
+computes two local arrays  LINDXA and  LINDXAU  containing
-equemene
+the  local  source and final destination position  resulting from the
-equemene
+application of row interchanges.
 equemene
-equemene
+On entry, the array  IPID  of length K is such that the row of global
-equemene
+index  IPID(i)  should be mapped onto row of global index  IPID(i+1).
-equemene
+Let  IA  be the global index of the first row to be swapped. For k in
-equemene
+[0..K/2), the row of global index IPID(2*k) should be mapped onto the
-equemene
+row of global index  IPID(2*k+1).  The question then, is to determine
-equemene
+which rows should ultimately be part of U.
 equemene
-equemene
+First, some rows of the process ICURROW  may be swapped locally.  One
-equemene
+of this row belongs to U, the other one belongs to my local  piece of
-equemene
+A.  The other  rows of the current block are swapped with remote rows
-equemene
+and are thus not part of U. These rows however should be sent  along,
-equemene
+and  grabbed by the other processes  as we  progress in the  exchange
-equemene
+phase.
 equemene
-equemene
+So, assume that I am  ICURROW  and consider a row of index  IPID(2*i)
-equemene
+that I own. If I own IPID(2*i+1) as well and IPID(2*i+1) - IA is less
-equemene
+than N,  this row is locally swapped and should be copied into  U  at
-equemene
+the position IPID(2*i+1) - IA. No row will be exchanged for this one.
-equemene
+If IPID(2*i+1)-IA is greater than N, then the row IPID(2*i) should be
-equemene
+locally copied into my local piece of A at the position corresponding
-equemene
+to the row of global index IPID(2*i+1).
 equemene
-equemene
+If the process  ICURROW does not own  IPID(2*i+1), then row IPID(2*i)
-equemene
+is to be swapped away and strictly speaking does not belong to U, but
-equemene
+to  A  remotely.  Since this  process will however send this array U,
-equemene
+this row is  copied into  U, exactly where the row IPID(2*i+1) should
-equemene
+go. For this, we search IPID for k1, such that IPID(2*k1) is equal to
-equemene
+IPID(2*i+1); and row  IPID(2*i) is to be copied in U  at the position
-equemene
+IPID(2*k1+1)-IA.
 equemene
-equemene
+It is thus  important to put the rows that go into U, i.e., such that
-equemene
+IPID(2*i+1) - IA is less than N at the begining of the array IPID. By
-equemene
+doing so,  U  is formed, and the local copy  is performed in just one
-equemene
+sweep.
 equemene
-equemene
+Two lists  LINDXA  and  LINDXAU are built.  LINDXA contains the local
-equemene
+index of the rows I have that should be copied. LINDXAU  contains the
-equemene
+local destination information: if LINDXAU(k) >= 0, row LINDXA(k) of A
-equemene
+is to be copied in U at position LINDXAU(k). Otherwise, row LINDXA(k)
-equemene
+of A should be locally copied into A(-LINDXAU(k),:).  In the  process
-equemene
+ICURROW, the initial packing algorithm proceeds as follows.
 equemene
-equemene
+  for all entries in IPID,
-equemene
+     if IPID(2*i) is in ICURROW,
-equemene
+        if IPID(2*i+1) is in ICURROW,
-equemene
+           if( IPID(2*i+1) - IA < N )
-equemene
+            save corresponding local position
-equemene
+            of this row (LINDXA);
-equemene
+            save local position (LINDXAU) in U
-equemene
+            where this row goes;
-equemene
+            [copy row IPID(2*i) in U at position
-equemene
+            IPID(2*i+1)-IA; ];
-equemene
+           else
-equemene
+            save corresponding local position of
-equemene
+            this row (LINDXA);
-equemene
+            save local position (-LINDXAU) in A
-equemene
+            where this row goes;
-equemene
+            [copy row IPID(2*i) in my piece of A
-equemene
+            at IPID(2*i+1);]
-equemene
+           end if
-equemene
+        else
-equemene
+           find k1 such that IPID(2*k1) = IPID(2*i+1);
-equemene
+           copy row IPID(2*i) in U at position
-equemene
+           IPID(2*k1+1)-IA;
-equemene
+           save corresponding local position of this
-equemene
+           row (LINDXA);
-equemene
+           save local position (LINDXAU) in U where
-equemene
+           this row goes;
-equemene
+        end if
-equemene
+     end if
-equemene
+  end for
 equemene
-equemene
+Second, if I am not the current row process  ICURROW, all source rows
-equemene
+in IPID that I own are part of U. Indeed,  they  are swapped with one
-equemene
+row  of  the  current  block  of rows,  and  the  main  factorization
-equemene
+algorithm proceeds one row after each other.  The processes different
-equemene
+from ICURROW,  should  exchange and accumulate  those rows until they
-equemene
+receive some data previously owned by the process ICURROW.
 equemene
-equemene
+In processes different from  ICURROW,  the  initial packing algorithm
-equemene
+proceeds as follows.  Consider a row of global index IPID(2*i) that I
-equemene
+own. When I will be receiving data previously owned by ICURROW, i.e.,
-equemene
+U, row IPID(2*i) should  replace the row in U at pos. IPID(2*i+1)-IA,
-equemene
+and  this particular row of U should be first copied into my piece of
-equemene
+A, at A(il,:),  where  il is the  local row  index  corresponding  to
-equemene
+IPID(2*i). Now,initially, this row will be packed into workspace, say
-equemene
+as the kth row of  that  work array.  The  following  algorithm  sets
-equemene
+LINDXAU[k] to IPID(2*i+1)-IA, that is the position in U where the row
-equemene
+should be copied. LINDXA(k) stores the local index in  A  where  this
-equemene
+row of U should be copied, i.e il.
 equemene
-equemene
+  for all entries in IPID,
-equemene
+     if IPID(2*i) is not in ICURROW,
-equemene
+        copy row IPID(2*i) in work array;
-equemene
+        save corresponding local position
-equemene
+        of this row (LINDXA);
-equemene
+        save position (LINDXAU) in U where
-equemene
+        this row should be copied;
-equemene
+     end if
-equemene
+  end for
 equemene
-equemene
+Since we are at it, we also globally figure  out  how many rows every
-equemene
+process has. That is necessary, because it would rather be cumbersome
-equemene
+to  figure it on  the fly  during the  bi-directional exchange phase.
-equemene
+This information is kept in the array  LLEN  of size NPROW. Also note
-equemene
+that the arrays LINDXA and LINDXAU are of max length equal to 2*N.
 equemene
-equemene
+<H1>Arguments</H1>
-equemene
+<PRE>
-equemene
+PANEL   (local input/output)          HPL_T_panel *
-equemene
+        On entry,  PANEL  points to the data structure containing the
-equemene
+        panel information.
-equemene
+</PRE>
-equemene
+<PRE>
-equemene
+K       (global input)                const int
-equemene
+        On entry, K specifies the number of entries in IPID.  K is at
-equemene
+        least 2*N, and at most 4*N.
-equemene
+</PRE>
-equemene
+<PRE>
-equemene
+IPID    (global input)                int *
-equemene
+        On entry,  IPID  is an array of length K. The first K entries
-equemene
+        of that array contain the src and final destination resulting
-equemene
+        from the application of the interchanges.
-equemene
+</PRE>
-equemene
+<PRE>
-equemene
+LINDXA  (local output)                int *
-equemene
+        On entry, LINDXA  is an array of dimension 2*N. On exit, this
-equemene
+        array contains the local indexes of the rows of A I have that
-equemene
+        should be copied into U.
-equemene
+</PRE>
-equemene
+<PRE>
-equemene
+LINDXAU (local output)                int *
-equemene
+        On exit, LINDXAU  is an array of dimension 2*N. On exit, this
-equemene
+        array contains  the local destination  information encoded as
-equemene
+        follows.  If LINDXAU(k) >= 0, row  LINDXA(k)  of A  is  to be
-equemene
+        copied in U at position LINDXAU(k).  Otherwise, row LINDXA(k)
-equemene
+        of A should be locally copied into A(-LINDXAU(k),:).
-equemene
+</PRE>
-equemene
+<PRE>
-equemene
+LLEN    (global output)               int *
-equemene
+        On entry,  LLEN  is  an array  of length  NPROW.  On exit, it
-equemene
+        contains how many rows every process has.
-equemene
+</PRE>
 equemene
-equemene
+<H1>See Also</H1>
-equemene
+<A HREF="HPL_pdlaswp00N.html">HPL_pdlaswp00N</A>,
-equemene
+<A HREF="HPL_pdlaswp00T.html">HPL_pdlaswp00T</A>,
-equemene
+<A HREF="HPL_pdlaswp01N.html">HPL_pdlaswp01N</A>,
-equemene
+<A HREF="HPL_pdlaswp01T.html">HPL_pdlaswp01T</A>.
 equemene
-equemene
+</BODY>
-equemene
+</HTML>