root / man / man3 / HPL_plindx0.3 @ 1
Historique | Voir | Annoter | Télécharger (6,78 ko)
1 | 1 | equemene | .TH HPL_plindx0 3 "September 10, 2008" "HPL 2.0" "HPL Library Functions" |
---|---|---|---|
2 | 1 | equemene | .SH NAME |
3 | 1 | equemene | HPL_plindx0 \- Compute local swapping index arrays. |
4 | 1 | equemene | .SH SYNOPSIS |
5 | 1 | equemene | \fB\&#include "hpl.h"\fR |
6 | 1 | equemene | |
7 | 1 | equemene | \fB\&void\fR |
8 | 1 | equemene | \fB\&HPL_plindx0(\fR |
9 | 1 | equemene | \fB\&HPL_T_panel *\fR |
10 | 1 | equemene | \fI\&PANEL\fR, |
11 | 1 | equemene | \fB\&const int\fR |
12 | 1 | equemene | \fI\&K\fR, |
13 | 1 | equemene | \fB\&int *\fR |
14 | 1 | equemene | \fI\&IPID\fR, |
15 | 1 | equemene | \fB\&int *\fR |
16 | 1 | equemene | \fI\&LINDXA\fR, |
17 | 1 | equemene | \fB\&int *\fR |
18 | 1 | equemene | \fI\&LINDXAU\fR, |
19 | 1 | equemene | \fB\&int *\fR |
20 | 1 | equemene | \fI\&LLEN\fR |
21 | 1 | equemene | \fB\&);\fR |
22 | 1 | equemene | .SH DESCRIPTION |
23 | 1 | equemene | \fB\&HPL_plindx0\fR |
24 | 1 | equemene | computes two local arrays LINDXA and LINDXAU containing |
25 | 1 | equemene | the local source and final destination position resulting from the |
26 | 1 | equemene | application of row interchanges. |
27 | 1 | equemene | |
28 | 1 | equemene | On entry, the array IPID of length K is such that the row of global |
29 | 1 | equemene | index IPID(i) should be mapped onto row of global index IPID(i+1). |
30 | 1 | equemene | Let IA be the global index of the first row to be swapped. For k in |
31 | 1 | equemene | [0..K/2), the row of global index IPID(2*k) should be mapped onto the |
32 | 1 | equemene | row of global index IPID(2*k+1). The question then, is to determine |
33 | 1 | equemene | which rows should ultimately be part of U. |
34 | 1 | equemene | |
35 | 1 | equemene | First, some rows of the process ICURROW may be swapped locally. One |
36 | 1 | equemene | of this row belongs to U, the other one belongs to my local piece of |
37 | 1 | equemene | A. The other rows of the current block are swapped with remote rows |
38 | 1 | equemene | and are thus not part of U. These rows however should be sent along, |
39 | 1 | equemene | and grabbed by the other processes as we progress in the exchange |
40 | 1 | equemene | phase. |
41 | 1 | equemene | |
42 | 1 | equemene | So, assume that I am ICURROW and consider a row of index IPID(2*i) |
43 | 1 | equemene | that I own. If I own IPID(2*i+1) as well and IPID(2*i+1) - IA is less |
44 | 1 | equemene | than N, this row is locally swapped and should be copied into U at |
45 | 1 | equemene | the position IPID(2*i+1) - IA. No row will be exchanged for this one. |
46 | 1 | equemene | If IPID(2*i+1)-IA is greater than N, then the row IPID(2*i) should be |
47 | 1 | equemene | locally copied into my local piece of A at the position corresponding |
48 | 1 | equemene | to the row of global index IPID(2*i+1). |
49 | 1 | equemene | |
50 | 1 | equemene | If the process ICURROW does not own IPID(2*i+1), then row IPID(2*i) |
51 | 1 | equemene | is to be swapped away and strictly speaking does not belong to U, but |
52 | 1 | equemene | to A remotely. Since this process will however send this array U, |
53 | 1 | equemene | this row is copied into U, exactly where the row IPID(2*i+1) should |
54 | 1 | equemene | go. For this, we search IPID for k1, such that IPID(2*k1) is equal to |
55 | 1 | equemene | IPID(2*i+1); and row IPID(2*i) is to be copied in U at the position |
56 | 1 | equemene | IPID(2*k1+1)-IA. |
57 | 1 | equemene | |
58 | 1 | equemene | It is thus important to put the rows that go into U, i.e., such that |
59 | 1 | equemene | IPID(2*i+1) - IA is less than N at the begining of the array IPID. By |
60 | 1 | equemene | doing so, U is formed, and the local copy is performed in just one |
61 | 1 | equemene | sweep. |
62 | 1 | equemene | |
63 | 1 | equemene | Two lists LINDXA and LINDXAU are built. LINDXA contains the local |
64 | 1 | equemene | index of the rows I have that should be copied. LINDXAU contains the |
65 | 1 | equemene | local destination information: if LINDXAU(k) >= 0, row LINDXA(k) of A |
66 | 1 | equemene | is to be copied in U at position LINDXAU(k). Otherwise, row LINDXA(k) |
67 | 1 | equemene | of A should be locally copied into A(-LINDXAU(k),:). In the process |
68 | 1 | equemene | ICURROW, the initial packing algorithm proceeds as follows. |
69 | 1 | equemene | |
70 | 1 | equemene | for all entries in IPID, |
71 | 1 | equemene | if IPID(2*i) is in ICURROW, |
72 | 1 | equemene | if IPID(2*i+1) is in ICURROW, |
73 | 1 | equemene | if( IPID(2*i+1) - IA < N ) |
74 | 1 | equemene | save corresponding local position |
75 | 1 | equemene | of this row (LINDXA); |
76 | 1 | equemene | save local position (LINDXAU) in U |
77 | 1 | equemene | where this row goes; |
78 | 1 | equemene | [copy row IPID(2*i) in U at position |
79 | 1 | equemene | IPID(2*i+1)-IA; ]; |
80 | 1 | equemene | else |
81 | 1 | equemene | save corresponding local position of |
82 | 1 | equemene | this row (LINDXA); |
83 | 1 | equemene | save local position (-LINDXAU) in A |
84 | 1 | equemene | where this row goes; |
85 | 1 | equemene | [copy row IPID(2*i) in my piece of A |
86 | 1 | equemene | at IPID(2*i+1);] |
87 | 1 | equemene | end if |
88 | 1 | equemene | else |
89 | 1 | equemene | find k1 such that IPID(2*k1) = IPID(2*i+1); |
90 | 1 | equemene | copy row IPID(2*i) in U at position |
91 | 1 | equemene | IPID(2*k1+1)-IA; |
92 | 1 | equemene | save corresponding local position of this |
93 | 1 | equemene | row (LINDXA); |
94 | 1 | equemene | save local position (LINDXAU) in U where |
95 | 1 | equemene | this row goes; |
96 | 1 | equemene | end if |
97 | 1 | equemene | end if |
98 | 1 | equemene | end for |
99 | 1 | equemene | |
100 | 1 | equemene | Second, if I am not the current row process ICURROW, all source rows |
101 | 1 | equemene | in IPID that I own are part of U. Indeed, they are swapped with one |
102 | 1 | equemene | row of the current block of rows, and the main factorization |
103 | 1 | equemene | algorithm proceeds one row after each other. The processes different |
104 | 1 | equemene | from ICURROW, should exchange and accumulate those rows until they |
105 | 1 | equemene | receive some data previously owned by the process ICURROW. |
106 | 1 | equemene | |
107 | 1 | equemene | In processes different from ICURROW, the initial packing algorithm |
108 | 1 | equemene | proceeds as follows. Consider a row of global index IPID(2*i) that I |
109 | 1 | equemene | own. When I will be receiving data previously owned by ICURROW, i.e., |
110 | 1 | equemene | U, row IPID(2*i) should replace the row in U at pos. IPID(2*i+1)-IA, |
111 | 1 | equemene | and this particular row of U should be first copied into my piece of |
112 | 1 | equemene | A, at A(il,:), where il is the local row index corresponding to |
113 | 1 | equemene | IPID(2*i). Now,initially, this row will be packed into workspace, say |
114 | 1 | equemene | as the kth row of that work array. The following algorithm sets |
115 | 1 | equemene | LINDXAU[k] to IPID(2*i+1)-IA, that is the position in U where the row |
116 | 1 | equemene | should be copied. LINDXA(k) stores the local index in A where this |
117 | 1 | equemene | row of U should be copied, i.e il. |
118 | 1 | equemene | |
119 | 1 | equemene | for all entries in IPID, |
120 | 1 | equemene | if IPID(2*i) is not in ICURROW, |
121 | 1 | equemene | copy row IPID(2*i) in work array; |
122 | 1 | equemene | save corresponding local position |
123 | 1 | equemene | of this row (LINDXA); |
124 | 1 | equemene | save position (LINDXAU) in U where |
125 | 1 | equemene | this row should be copied; |
126 | 1 | equemene | end if |
127 | 1 | equemene | end for |
128 | 1 | equemene | |
129 | 1 | equemene | Since we are at it, we also globally figure out how many rows every |
130 | 1 | equemene | process has. That is necessary, because it would rather be cumbersome |
131 | 1 | equemene | to figure it on the fly during the bi-directional exchange phase. |
132 | 1 | equemene | This information is kept in the array LLEN of size NPROW. Also note |
133 | 1 | equemene | that the arrays LINDXA and LINDXAU are of max length equal to 2*N. |
134 | 1 | equemene | .SH ARGUMENTS |
135 | 1 | equemene | .TP 8 |
136 | 1 | equemene | PANEL (local input/output) HPL_T_panel * |
137 | 1 | equemene | On entry, PANEL points to the data structure containing the |
138 | 1 | equemene | panel information. |
139 | 1 | equemene | .TP 8 |
140 | 1 | equemene | K (global input) const int |
141 | 1 | equemene | On entry, K specifies the number of entries in IPID. K is at |
142 | 1 | equemene | least 2*N, and at most 4*N. |
143 | 1 | equemene | .TP 8 |
144 | 1 | equemene | IPID (global input) int * |
145 | 1 | equemene | On entry, IPID is an array of length K. The first K entries |
146 | 1 | equemene | of that array contain the src and final destination resulting |
147 | 1 | equemene | from the application of the interchanges. |
148 | 1 | equemene | .TP 8 |
149 | 1 | equemene | LINDXA (local output) int * |
150 | 1 | equemene | On entry, LINDXA is an array of dimension 2*N. On exit, this |
151 | 1 | equemene | array contains the local indexes of the rows of A I have that |
152 | 1 | equemene | should be copied into U. |
153 | 1 | equemene | .TP 8 |
154 | 1 | equemene | LINDXAU (local output) int * |
155 | 1 | equemene | On exit, LINDXAU is an array of dimension 2*N. On exit, this |
156 | 1 | equemene | array contains the local destination information encoded as |
157 | 1 | equemene | follows. If LINDXAU(k) >= 0, row LINDXA(k) of A is to be |
158 | 1 | equemene | copied in U at position LINDXAU(k). Otherwise, row LINDXA(k) |
159 | 1 | equemene | of A should be locally copied into A(-LINDXAU(k),:). |
160 | 1 | equemene | .TP 8 |
161 | 1 | equemene | LLEN (global output) int * |
162 | 1 | equemene | On entry, LLEN is an array of length NPROW. On exit, it |
163 | 1 | equemene | contains how many rows every process has. |
164 | 1 | equemene | .SH SEE ALSO |
165 | 1 | equemene | .BR HPL_pdlaswp00N \ (3), |
166 | 1 | equemene | .BR HPL_pdlaswp00T \ (3), |
167 | 1 | equemene | .BR HPL_pdlaswp01N \ (3), |
168 | 1 | equemene | .BR HPL_pdlaswp01T \ (3). |