root / www / HPL_plindx0.html
Historique | Voir | Annoter | Télécharger (7,34 ko)
1 | 1 | equemene | <HTML>
|
---|---|---|---|
2 | 1 | equemene | <HEAD>
|
3 | 1 | equemene | <TITLE>HPL_plindx0 HPL 2.0 Library Functions September 10, 2008</TITLE> |
4 | 1 | equemene | </HEAD>
|
5 | 1 | equemene | |
6 | 1 | equemene | <BODY BGCOLOR="WHITE" TEXT = "#000000" LINK = "#0000ff" VLINK = "#000099" |
7 | 1 | equemene | ALINK = "#ffff00"> |
8 | 1 | equemene | |
9 | 1 | equemene | <H1>Name</H1> |
10 | 1 | equemene | <B>HPL_plindx0</B> Compute local swapping index arrays. |
11 | 1 | equemene | |
12 | 1 | equemene | <H1>Synopsis</H1> |
13 | 1 | equemene | <CODE>#include "hpl.h"</CODE><BR><BR> |
14 | 1 | equemene | <CODE>void</CODE> |
15 | 1 | equemene | <CODE>HPL_plindx0(</CODE> |
16 | 1 | equemene | <CODE>HPL_T_panel *</CODE> |
17 | 1 | equemene | <CODE>PANEL</CODE>, |
18 | 1 | equemene | <CODE>const int</CODE> |
19 | 1 | equemene | <CODE>K</CODE>, |
20 | 1 | equemene | <CODE>int *</CODE> |
21 | 1 | equemene | <CODE>IPID</CODE>, |
22 | 1 | equemene | <CODE>int *</CODE> |
23 | 1 | equemene | <CODE>LINDXA</CODE>, |
24 | 1 | equemene | <CODE>int *</CODE> |
25 | 1 | equemene | <CODE>LINDXAU</CODE>, |
26 | 1 | equemene | <CODE>int *</CODE> |
27 | 1 | equemene | <CODE>LLEN</CODE> |
28 | 1 | equemene | <CODE>);</CODE> |
29 | 1 | equemene | |
30 | 1 | equemene | <H1>Description</H1> |
31 | 1 | equemene | <B>HPL_plindx0</B> |
32 | 1 | equemene | computes two local arrays LINDXA and LINDXAU containing |
33 | 1 | equemene | the local source and final destination position resulting from the |
34 | 1 | equemene | application of row interchanges. |
35 | 1 | equemene | |
36 | 1 | equemene | On entry, the array IPID of length K is such that the row of global |
37 | 1 | equemene | index IPID(i) should be mapped onto row of global index IPID(i+1). |
38 | 1 | equemene | Let IA be the global index of the first row to be swapped. For k in |
39 | 1 | equemene | [0..K/2), the row of global index IPID(2*k) should be mapped onto the |
40 | 1 | equemene | row of global index IPID(2*k+1). The question then, is to determine |
41 | 1 | equemene | which rows should ultimately be part of U. |
42 | 1 | equemene | |
43 | 1 | equemene | First, some rows of the process ICURROW may be swapped locally. One |
44 | 1 | equemene | of this row belongs to U, the other one belongs to my local piece of |
45 | 1 | equemene | A. The other rows of the current block are swapped with remote rows |
46 | 1 | equemene | and are thus not part of U. These rows however should be sent along, |
47 | 1 | equemene | and grabbed by the other processes as we progress in the exchange |
48 | 1 | equemene | phase. |
49 | 1 | equemene | |
50 | 1 | equemene | So, assume that I am ICURROW and consider a row of index IPID(2*i) |
51 | 1 | equemene | that I own. If I own IPID(2*i+1) as well and IPID(2*i+1) - IA is less |
52 | 1 | equemene | than N, this row is locally swapped and should be copied into U at |
53 | 1 | equemene | the position IPID(2*i+1) - IA. No row will be exchanged for this one. |
54 | 1 | equemene | If IPID(2*i+1)-IA is greater than N, then the row IPID(2*i) should be |
55 | 1 | equemene | locally copied into my local piece of A at the position corresponding |
56 | 1 | equemene | to the row of global index IPID(2*i+1). |
57 | 1 | equemene | |
58 | 1 | equemene | If the process ICURROW does not own IPID(2*i+1), then row IPID(2*i) |
59 | 1 | equemene | is to be swapped away and strictly speaking does not belong to U, but |
60 | 1 | equemene | to A remotely. Since this process will however send this array U, |
61 | 1 | equemene | this row is copied into U, exactly where the row IPID(2*i+1) should |
62 | 1 | equemene | go. For this, we search IPID for k1, such that IPID(2*k1) is equal to |
63 | 1 | equemene | IPID(2*i+1); and row IPID(2*i) is to be copied in U at the position |
64 | 1 | equemene | IPID(2*k1+1)-IA. |
65 | 1 | equemene | |
66 | 1 | equemene | It is thus important to put the rows that go into U, i.e., such that |
67 | 1 | equemene | IPID(2*i+1) - IA is less than N at the begining of the array IPID. By |
68 | 1 | equemene | doing so, U is formed, and the local copy is performed in just one |
69 | 1 | equemene | sweep. |
70 | 1 | equemene | |
71 | 1 | equemene | Two lists LINDXA and LINDXAU are built. LINDXA contains the local |
72 | 1 | equemene | index of the rows I have that should be copied. LINDXAU contains the |
73 | 1 | equemene | local destination information: if LINDXAU(k) >= 0, row LINDXA(k) of A
|
74 | 1 | equemene | is to be copied in U at position LINDXAU(k). Otherwise, row LINDXA(k) |
75 | 1 | equemene | of A should be locally copied into A(-LINDXAU(k),:). In the process |
76 | 1 | equemene | ICURROW, the initial packing algorithm proceeds as follows. |
77 | 1 | equemene | |
78 | 1 | equemene | for all entries in IPID, |
79 | 1 | equemene | if IPID(2*i) is in ICURROW, |
80 | 1 | equemene | if IPID(2*i+1) is in ICURROW, |
81 | 1 | equemene | if( IPID(2*i+1) - IA < N )
|
82 | 1 | equemene | save corresponding local position |
83 | 1 | equemene | of this row (LINDXA); |
84 | 1 | equemene | save local position (LINDXAU) in U |
85 | 1 | equemene | where this row goes; |
86 | 1 | equemene | [copy row IPID(2*i) in U at position |
87 | 1 | equemene | IPID(2*i+1)-IA; ]; |
88 | 1 | equemene | else |
89 | 1 | equemene | save corresponding local position of |
90 | 1 | equemene | this row (LINDXA); |
91 | 1 | equemene | save local position (-LINDXAU) in A |
92 | 1 | equemene | where this row goes; |
93 | 1 | equemene | [copy row IPID(2*i) in my piece of A |
94 | 1 | equemene | at IPID(2*i+1);] |
95 | 1 | equemene | end if |
96 | 1 | equemene | else |
97 | 1 | equemene | find k1 such that IPID(2*k1) = IPID(2*i+1); |
98 | 1 | equemene | copy row IPID(2*i) in U at position |
99 | 1 | equemene | IPID(2*k1+1)-IA; |
100 | 1 | equemene | save corresponding local position of this |
101 | 1 | equemene | row (LINDXA); |
102 | 1 | equemene | save local position (LINDXAU) in U where |
103 | 1 | equemene | this row goes; |
104 | 1 | equemene | end if |
105 | 1 | equemene | end if |
106 | 1 | equemene | end for |
107 | 1 | equemene | |
108 | 1 | equemene | Second, if I am not the current row process ICURROW, all source rows |
109 | 1 | equemene | in IPID that I own are part of U. Indeed, they are swapped with one |
110 | 1 | equemene | row of the current block of rows, and the main factorization |
111 | 1 | equemene | algorithm proceeds one row after each other. The processes different |
112 | 1 | equemene | from ICURROW, should exchange and accumulate those rows until they |
113 | 1 | equemene | receive some data previously owned by the process ICURROW. |
114 | 1 | equemene | |
115 | 1 | equemene | In processes different from ICURROW, the initial packing algorithm |
116 | 1 | equemene | proceeds as follows. Consider a row of global index IPID(2*i) that I |
117 | 1 | equemene | own. When I will be receiving data previously owned by ICURROW, i.e., |
118 | 1 | equemene | U, row IPID(2*i) should replace the row in U at pos. IPID(2*i+1)-IA, |
119 | 1 | equemene | and this particular row of U should be first copied into my piece of |
120 | 1 | equemene | A, at A(il,:), where il is the local row index corresponding to |
121 | 1 | equemene | IPID(2*i). Now,initially, this row will be packed into workspace, say |
122 | 1 | equemene | as the kth row of that work array. The following algorithm sets |
123 | 1 | equemene | LINDXAU[k] to IPID(2*i+1)-IA, that is the position in U where the row |
124 | 1 | equemene | should be copied. LINDXA(k) stores the local index in A where this |
125 | 1 | equemene | row of U should be copied, i.e il. |
126 | 1 | equemene | |
127 | 1 | equemene | for all entries in IPID, |
128 | 1 | equemene | if IPID(2*i) is not in ICURROW, |
129 | 1 | equemene | copy row IPID(2*i) in work array; |
130 | 1 | equemene | save corresponding local position |
131 | 1 | equemene | of this row (LINDXA); |
132 | 1 | equemene | save position (LINDXAU) in U where |
133 | 1 | equemene | this row should be copied; |
134 | 1 | equemene | end if |
135 | 1 | equemene | end for |
136 | 1 | equemene | |
137 | 1 | equemene | Since we are at it, we also globally figure out how many rows every |
138 | 1 | equemene | process has. That is necessary, because it would rather be cumbersome |
139 | 1 | equemene | to figure it on the fly during the bi-directional exchange phase. |
140 | 1 | equemene | This information is kept in the array LLEN of size NPROW. Also note |
141 | 1 | equemene | that the arrays LINDXA and LINDXAU are of max length equal to 2*N. |
142 | 1 | equemene | |
143 | 1 | equemene | <H1>Arguments</H1> |
144 | 1 | equemene | <PRE>
|
145 | 1 | equemene | PANEL (local input/output) HPL_T_panel * |
146 | 1 | equemene | On entry, PANEL points to the data structure containing the |
147 | 1 | equemene | panel information. |
148 | 1 | equemene | </PRE>
|
149 | 1 | equemene | <PRE>
|
150 | 1 | equemene | K (global input) const int |
151 | 1 | equemene | On entry, K specifies the number of entries in IPID. K is at |
152 | 1 | equemene | least 2*N, and at most 4*N. |
153 | 1 | equemene | </PRE>
|
154 | 1 | equemene | <PRE>
|
155 | 1 | equemene | IPID (global input) int * |
156 | 1 | equemene | On entry, IPID is an array of length K. The first K entries |
157 | 1 | equemene | of that array contain the src and final destination resulting |
158 | 1 | equemene | from the application of the interchanges. |
159 | 1 | equemene | </PRE>
|
160 | 1 | equemene | <PRE>
|
161 | 1 | equemene | LINDXA (local output) int * |
162 | 1 | equemene | On entry, LINDXA is an array of dimension 2*N. On exit, this |
163 | 1 | equemene | array contains the local indexes of the rows of A I have that |
164 | 1 | equemene | should be copied into U. |
165 | 1 | equemene | </PRE>
|
166 | 1 | equemene | <PRE>
|
167 | 1 | equemene | LINDXAU (local output) int * |
168 | 1 | equemene | On exit, LINDXAU is an array of dimension 2*N. On exit, this |
169 | 1 | equemene | array contains the local destination information encoded as |
170 | 1 | equemene | follows. If LINDXAU(k) >= 0, row LINDXA(k) of A is to be
|
171 | 1 | equemene | copied in U at position LINDXAU(k). Otherwise, row LINDXA(k) |
172 | 1 | equemene | of A should be locally copied into A(-LINDXAU(k),:). |
173 | 1 | equemene | </PRE>
|
174 | 1 | equemene | <PRE>
|
175 | 1 | equemene | LLEN (global output) int * |
176 | 1 | equemene | On entry, LLEN is an array of length NPROW. On exit, it |
177 | 1 | equemene | contains how many rows every process has. |
178 | 1 | equemene | </PRE>
|
179 | 1 | equemene | |
180 | 1 | equemene | <H1>See Also</H1> |
181 | 1 | equemene | <A HREF="HPL_pdlaswp00N.html">HPL_pdlaswp00N</A>, |
182 | 1 | equemene | <A HREF="HPL_pdlaswp00T.html">HPL_pdlaswp00T</A>, |
183 | 1 | equemene | <A HREF="HPL_pdlaswp01N.html">HPL_pdlaswp01N</A>, |
184 | 1 | equemene | <A HREF="HPL_pdlaswp01T.html">HPL_pdlaswp01T</A>. |
185 | 1 | equemene | |
186 | 1 | equemene | </BODY>
|
187 | 1 | equemene | </HTML> |