root / man / man3 / HPL_plindx0.3 @ 1
Historique | Voir | Annoter | Télécharger (6,78 ko)
1 |
.TH HPL_plindx0 3 "September 10, 2008" "HPL 2.0" "HPL Library Functions" |
---|---|
2 |
.SH NAME |
3 |
HPL_plindx0 \- Compute local swapping index arrays. |
4 |
.SH SYNOPSIS |
5 |
\fB\&#include "hpl.h"\fR |
6 |
|
7 |
\fB\&void\fR |
8 |
\fB\&HPL_plindx0(\fR |
9 |
\fB\&HPL_T_panel *\fR |
10 |
\fI\&PANEL\fR, |
11 |
\fB\&const int\fR |
12 |
\fI\&K\fR, |
13 |
\fB\&int *\fR |
14 |
\fI\&IPID\fR, |
15 |
\fB\&int *\fR |
16 |
\fI\&LINDXA\fR, |
17 |
\fB\&int *\fR |
18 |
\fI\&LINDXAU\fR, |
19 |
\fB\&int *\fR |
20 |
\fI\&LLEN\fR |
21 |
\fB\&);\fR |
22 |
.SH DESCRIPTION |
23 |
\fB\&HPL_plindx0\fR |
24 |
computes two local arrays LINDXA and LINDXAU containing |
25 |
the local source and final destination position resulting from the |
26 |
application of row interchanges. |
27 |
|
28 |
On entry, the array IPID of length K is such that the row of global |
29 |
index IPID(i) should be mapped onto row of global index IPID(i+1). |
30 |
Let IA be the global index of the first row to be swapped. For k in |
31 |
[0..K/2), the row of global index IPID(2*k) should be mapped onto the |
32 |
row of global index IPID(2*k+1). The question then, is to determine |
33 |
which rows should ultimately be part of U. |
34 |
|
35 |
First, some rows of the process ICURROW may be swapped locally. One |
36 |
of this row belongs to U, the other one belongs to my local piece of |
37 |
A. The other rows of the current block are swapped with remote rows |
38 |
and are thus not part of U. These rows however should be sent along, |
39 |
and grabbed by the other processes as we progress in the exchange |
40 |
phase. |
41 |
|
42 |
So, assume that I am ICURROW and consider a row of index IPID(2*i) |
43 |
that I own. If I own IPID(2*i+1) as well and IPID(2*i+1) - IA is less |
44 |
than N, this row is locally swapped and should be copied into U at |
45 |
the position IPID(2*i+1) - IA. No row will be exchanged for this one. |
46 |
If IPID(2*i+1)-IA is greater than N, then the row IPID(2*i) should be |
47 |
locally copied into my local piece of A at the position corresponding |
48 |
to the row of global index IPID(2*i+1). |
49 |
|
50 |
If the process ICURROW does not own IPID(2*i+1), then row IPID(2*i) |
51 |
is to be swapped away and strictly speaking does not belong to U, but |
52 |
to A remotely. Since this process will however send this array U, |
53 |
this row is copied into U, exactly where the row IPID(2*i+1) should |
54 |
go. For this, we search IPID for k1, such that IPID(2*k1) is equal to |
55 |
IPID(2*i+1); and row IPID(2*i) is to be copied in U at the position |
56 |
IPID(2*k1+1)-IA. |
57 |
|
58 |
It is thus important to put the rows that go into U, i.e., such that |
59 |
IPID(2*i+1) - IA is less than N at the begining of the array IPID. By |
60 |
doing so, U is formed, and the local copy is performed in just one |
61 |
sweep. |
62 |
|
63 |
Two lists LINDXA and LINDXAU are built. LINDXA contains the local |
64 |
index of the rows I have that should be copied. LINDXAU contains the |
65 |
local destination information: if LINDXAU(k) >= 0, row LINDXA(k) of A |
66 |
is to be copied in U at position LINDXAU(k). Otherwise, row LINDXA(k) |
67 |
of A should be locally copied into A(-LINDXAU(k),:). In the process |
68 |
ICURROW, the initial packing algorithm proceeds as follows. |
69 |
|
70 |
for all entries in IPID, |
71 |
if IPID(2*i) is in ICURROW, |
72 |
if IPID(2*i+1) is in ICURROW, |
73 |
if( IPID(2*i+1) - IA < N ) |
74 |
save corresponding local position |
75 |
of this row (LINDXA); |
76 |
save local position (LINDXAU) in U |
77 |
where this row goes; |
78 |
[copy row IPID(2*i) in U at position |
79 |
IPID(2*i+1)-IA; ]; |
80 |
else |
81 |
save corresponding local position of |
82 |
this row (LINDXA); |
83 |
save local position (-LINDXAU) in A |
84 |
where this row goes; |
85 |
[copy row IPID(2*i) in my piece of A |
86 |
at IPID(2*i+1);] |
87 |
end if |
88 |
else |
89 |
find k1 such that IPID(2*k1) = IPID(2*i+1); |
90 |
copy row IPID(2*i) in U at position |
91 |
IPID(2*k1+1)-IA; |
92 |
save corresponding local position of this |
93 |
row (LINDXA); |
94 |
save local position (LINDXAU) in U where |
95 |
this row goes; |
96 |
end if |
97 |
end if |
98 |
end for |
99 |
|
100 |
Second, if I am not the current row process ICURROW, all source rows |
101 |
in IPID that I own are part of U. Indeed, they are swapped with one |
102 |
row of the current block of rows, and the main factorization |
103 |
algorithm proceeds one row after each other. The processes different |
104 |
from ICURROW, should exchange and accumulate those rows until they |
105 |
receive some data previously owned by the process ICURROW. |
106 |
|
107 |
In processes different from ICURROW, the initial packing algorithm |
108 |
proceeds as follows. Consider a row of global index IPID(2*i) that I |
109 |
own. When I will be receiving data previously owned by ICURROW, i.e., |
110 |
U, row IPID(2*i) should replace the row in U at pos. IPID(2*i+1)-IA, |
111 |
and this particular row of U should be first copied into my piece of |
112 |
A, at A(il,:), where il is the local row index corresponding to |
113 |
IPID(2*i). Now,initially, this row will be packed into workspace, say |
114 |
as the kth row of that work array. The following algorithm sets |
115 |
LINDXAU[k] to IPID(2*i+1)-IA, that is the position in U where the row |
116 |
should be copied. LINDXA(k) stores the local index in A where this |
117 |
row of U should be copied, i.e il. |
118 |
|
119 |
for all entries in IPID, |
120 |
if IPID(2*i) is not in ICURROW, |
121 |
copy row IPID(2*i) in work array; |
122 |
save corresponding local position |
123 |
of this row (LINDXA); |
124 |
save position (LINDXAU) in U where |
125 |
this row should be copied; |
126 |
end if |
127 |
end for |
128 |
|
129 |
Since we are at it, we also globally figure out how many rows every |
130 |
process has. That is necessary, because it would rather be cumbersome |
131 |
to figure it on the fly during the bi-directional exchange phase. |
132 |
This information is kept in the array LLEN of size NPROW. Also note |
133 |
that the arrays LINDXA and LINDXAU are of max length equal to 2*N. |
134 |
.SH ARGUMENTS |
135 |
.TP 8 |
136 |
PANEL (local input/output) HPL_T_panel * |
137 |
On entry, PANEL points to the data structure containing the |
138 |
panel information. |
139 |
.TP 8 |
140 |
K (global input) const int |
141 |
On entry, K specifies the number of entries in IPID. K is at |
142 |
least 2*N, and at most 4*N. |
143 |
.TP 8 |
144 |
IPID (global input) int * |
145 |
On entry, IPID is an array of length K. The first K entries |
146 |
of that array contain the src and final destination resulting |
147 |
from the application of the interchanges. |
148 |
.TP 8 |
149 |
LINDXA (local output) int * |
150 |
On entry, LINDXA is an array of dimension 2*N. On exit, this |
151 |
array contains the local indexes of the rows of A I have that |
152 |
should be copied into U. |
153 |
.TP 8 |
154 |
LINDXAU (local output) int * |
155 |
On exit, LINDXAU is an array of dimension 2*N. On exit, this |
156 |
array contains the local destination information encoded as |
157 |
follows. If LINDXAU(k) >= 0, row LINDXA(k) of A is to be |
158 |
copied in U at position LINDXAU(k). Otherwise, row LINDXA(k) |
159 |
of A should be locally copied into A(-LINDXAU(k),:). |
160 |
.TP 8 |
161 |
LLEN (global output) int * |
162 |
On entry, LLEN is an array of length NPROW. On exit, it |
163 |
contains how many rows every process has. |
164 |
.SH SEE ALSO |
165 |
.BR HPL_pdlaswp00N \ (3), |
166 |
.BR HPL_pdlaswp00T \ (3), |
167 |
.BR HPL_pdlaswp01N \ (3), |
168 |
.BR HPL_pdlaswp01T \ (3). |