Statistiques
| Révision :

root / www / HPL_plindx0.html

Historique | Voir | Annoter | Télécharger (7,34 ko)

1 1 equemene
<HTML>
2 1 equemene
<HEAD>
3 1 equemene
<TITLE>HPL_plindx0 HPL 2.0 Library Functions September 10, 2008</TITLE>
4 1 equemene
</HEAD>
5 1 equemene
6 1 equemene
<BODY BGCOLOR="WHITE" TEXT = "#000000" LINK = "#0000ff" VLINK = "#000099"
7 1 equemene
      ALINK = "#ffff00">
8 1 equemene
9 1 equemene
<H1>Name</H1>
10 1 equemene
<B>HPL_plindx0</B> Compute local swapping index arrays.
11 1 equemene
12 1 equemene
<H1>Synopsis</H1>
13 1 equemene
<CODE>#include "hpl.h"</CODE><BR><BR>
14 1 equemene
<CODE>void</CODE>
15 1 equemene
<CODE>HPL_plindx0(</CODE>
16 1 equemene
<CODE>HPL_T_panel *</CODE>
17 1 equemene
<CODE>PANEL</CODE>,
18 1 equemene
<CODE>const int</CODE>
19 1 equemene
<CODE>K</CODE>,
20 1 equemene
<CODE>int *</CODE>
21 1 equemene
<CODE>IPID</CODE>,
22 1 equemene
<CODE>int *</CODE>
23 1 equemene
<CODE>LINDXA</CODE>,
24 1 equemene
<CODE>int *</CODE>
25 1 equemene
<CODE>LINDXAU</CODE>,
26 1 equemene
<CODE>int *</CODE>
27 1 equemene
<CODE>LLEN</CODE>
28 1 equemene
<CODE>);</CODE>
29 1 equemene
30 1 equemene
<H1>Description</H1>
31 1 equemene
<B>HPL_plindx0</B>
32 1 equemene
computes two local arrays  LINDXA and  LINDXAU  containing
33 1 equemene
the  local  source and final destination position  resulting from the
34 1 equemene
application of row interchanges.
35 1 equemene
36 1 equemene
On entry, the array  IPID  of length K is such that the row of global
37 1 equemene
index  IPID(i)  should be mapped onto row of global index  IPID(i+1).
38 1 equemene
Let  IA  be the global index of the first row to be swapped. For k in
39 1 equemene
[0..K/2), the row of global index IPID(2*k) should be mapped onto the
40 1 equemene
row of global index  IPID(2*k+1).  The question then, is to determine
41 1 equemene
which rows should ultimately be part of U.
42 1 equemene
43 1 equemene
First, some rows of the process ICURROW  may be swapped locally.  One
44 1 equemene
of this row belongs to U, the other one belongs to my local  piece of
45 1 equemene
A.  The other  rows of the current block are swapped with remote rows
46 1 equemene
and are thus not part of U. These rows however should be sent  along,
47 1 equemene
and  grabbed by the other processes  as we  progress in the  exchange
48 1 equemene
phase.
49 1 equemene
50 1 equemene
So, assume that I am  ICURROW  and consider a row of index  IPID(2*i)
51 1 equemene
that I own. If I own IPID(2*i+1) as well and IPID(2*i+1) - IA is less
52 1 equemene
than N,  this row is locally swapped and should be copied into  U  at
53 1 equemene
the position IPID(2*i+1) - IA. No row will be exchanged for this one.
54 1 equemene
If IPID(2*i+1)-IA is greater than N, then the row IPID(2*i) should be
55 1 equemene
locally copied into my local piece of A at the position corresponding
56 1 equemene
to the row of global index IPID(2*i+1).
57 1 equemene
58 1 equemene
If the process  ICURROW does not own  IPID(2*i+1), then row IPID(2*i)
59 1 equemene
is to be swapped away and strictly speaking does not belong to U, but
60 1 equemene
to  A  remotely.  Since this  process will however send this array U,
61 1 equemene
this row is  copied into  U, exactly where the row IPID(2*i+1) should
62 1 equemene
go. For this, we search IPID for k1, such that IPID(2*k1) is equal to
63 1 equemene
IPID(2*i+1); and row  IPID(2*i) is to be copied in U  at the position
64 1 equemene
IPID(2*k1+1)-IA.
65 1 equemene
66 1 equemene
It is thus  important to put the rows that go into U, i.e., such that
67 1 equemene
IPID(2*i+1) - IA is less than N at the begining of the array IPID. By
68 1 equemene
doing so,  U  is formed, and the local copy  is performed in just one
69 1 equemene
sweep.
70 1 equemene
71 1 equemene
Two lists  LINDXA  and  LINDXAU are built.  LINDXA contains the local
72 1 equemene
index of the rows I have that should be copied. LINDXAU  contains the
73 1 equemene
local destination information: if LINDXAU(k) >= 0, row LINDXA(k) of A
74 1 equemene
is to be copied in U at position LINDXAU(k). Otherwise, row LINDXA(k)
75 1 equemene
of A should be locally copied into A(-LINDXAU(k),:).  In the  process
76 1 equemene
ICURROW, the initial packing algorithm proceeds as follows.
77 1 equemene
78 1 equemene
  for all entries in IPID,
79 1 equemene
     if IPID(2*i) is in ICURROW,
80 1 equemene
        if IPID(2*i+1) is in ICURROW,
81 1 equemene
           if( IPID(2*i+1) - IA < N )
82 1 equemene
            save corresponding local position
83 1 equemene
            of this row (LINDXA);
84 1 equemene
            save local position (LINDXAU) in U
85 1 equemene
            where this row goes;
86 1 equemene
            [copy row IPID(2*i) in U at position
87 1 equemene
            IPID(2*i+1)-IA; ];
88 1 equemene
           else
89 1 equemene
            save corresponding local position of
90 1 equemene
            this row (LINDXA);
91 1 equemene
            save local position (-LINDXAU) in A
92 1 equemene
            where this row goes;
93 1 equemene
            [copy row IPID(2*i) in my piece of A
94 1 equemene
            at IPID(2*i+1);]
95 1 equemene
           end if
96 1 equemene
        else
97 1 equemene
           find k1 such that IPID(2*k1) = IPID(2*i+1);
98 1 equemene
           copy row IPID(2*i) in U at position
99 1 equemene
           IPID(2*k1+1)-IA;
100 1 equemene
           save corresponding local position of this
101 1 equemene
           row (LINDXA);
102 1 equemene
           save local position (LINDXAU) in U where
103 1 equemene
           this row goes;
104 1 equemene
        end if
105 1 equemene
     end if
106 1 equemene
  end for
107 1 equemene
108 1 equemene
Second, if I am not the current row process  ICURROW, all source rows
109 1 equemene
in IPID that I own are part of U. Indeed,  they  are swapped with one
110 1 equemene
row  of  the  current  block  of rows,  and  the  main  factorization
111 1 equemene
algorithm proceeds one row after each other.  The processes different
112 1 equemene
from ICURROW,  should  exchange and accumulate  those rows until they
113 1 equemene
receive some data previously owned by the process ICURROW.
114 1 equemene
115 1 equemene
In processes different from  ICURROW,  the  initial packing algorithm
116 1 equemene
proceeds as follows.  Consider a row of global index IPID(2*i) that I
117 1 equemene
own. When I will be receiving data previously owned by ICURROW, i.e.,
118 1 equemene
U, row IPID(2*i) should  replace the row in U at pos. IPID(2*i+1)-IA,
119 1 equemene
and  this particular row of U should be first copied into my piece of
120 1 equemene
A, at A(il,:),  where  il is the  local row  index  corresponding  to
121 1 equemene
IPID(2*i). Now,initially, this row will be packed into workspace, say
122 1 equemene
as the kth row of  that  work array.  The  following  algorithm  sets
123 1 equemene
LINDXAU[k] to IPID(2*i+1)-IA, that is the position in U where the row
124 1 equemene
should be copied. LINDXA(k) stores the local index in  A  where  this
125 1 equemene
row of U should be copied, i.e il.
126 1 equemene
127 1 equemene
  for all entries in IPID,
128 1 equemene
     if IPID(2*i) is not in ICURROW,
129 1 equemene
        copy row IPID(2*i) in work array;
130 1 equemene
        save corresponding local position
131 1 equemene
        of this row (LINDXA);
132 1 equemene
        save position (LINDXAU) in U where
133 1 equemene
        this row should be copied;
134 1 equemene
     end if
135 1 equemene
  end for
136 1 equemene
137 1 equemene
Since we are at it, we also globally figure  out  how many rows every
138 1 equemene
process has. That is necessary, because it would rather be cumbersome
139 1 equemene
to  figure it on  the fly  during the  bi-directional exchange phase.
140 1 equemene
This information is kept in the array  LLEN  of size NPROW. Also note
141 1 equemene
that the arrays LINDXA and LINDXAU are of max length equal to 2*N.
142 1 equemene
143 1 equemene
<H1>Arguments</H1>
144 1 equemene
<PRE>
145 1 equemene
PANEL   (local input/output)          HPL_T_panel *
146 1 equemene
        On entry,  PANEL  points to the data structure containing the
147 1 equemene
        panel information.
148 1 equemene
</PRE>
149 1 equemene
<PRE>
150 1 equemene
K       (global input)                const int
151 1 equemene
        On entry, K specifies the number of entries in IPID.  K is at
152 1 equemene
        least 2*N, and at most 4*N.
153 1 equemene
</PRE>
154 1 equemene
<PRE>
155 1 equemene
IPID    (global input)                int *
156 1 equemene
        On entry,  IPID  is an array of length K. The first K entries
157 1 equemene
        of that array contain the src and final destination resulting
158 1 equemene
        from the application of the interchanges.
159 1 equemene
</PRE>
160 1 equemene
<PRE>
161 1 equemene
LINDXA  (local output)                int *
162 1 equemene
        On entry, LINDXA  is an array of dimension 2*N. On exit, this
163 1 equemene
        array contains the local indexes of the rows of A I have that
164 1 equemene
        should be copied into U.
165 1 equemene
</PRE>
166 1 equemene
<PRE>
167 1 equemene
LINDXAU (local output)                int *
168 1 equemene
        On exit, LINDXAU  is an array of dimension 2*N. On exit, this
169 1 equemene
        array contains  the local destination  information encoded as
170 1 equemene
        follows.  If LINDXAU(k) >= 0, row  LINDXA(k)  of A  is  to be
171 1 equemene
        copied in U at position LINDXAU(k).  Otherwise, row LINDXA(k)
172 1 equemene
        of A should be locally copied into A(-LINDXAU(k),:).
173 1 equemene
</PRE>
174 1 equemene
<PRE>
175 1 equemene
LLEN    (global output)               int *
176 1 equemene
        On entry,  LLEN  is  an array  of length  NPROW.  On exit, it
177 1 equemene
        contains how many rows every process has.
178 1 equemene
</PRE>
179 1 equemene
180 1 equemene
<H1>See Also</H1>
181 1 equemene
<A HREF="HPL_pdlaswp00N.html">HPL_pdlaswp00N</A>,
182 1 equemene
<A HREF="HPL_pdlaswp00T.html">HPL_pdlaswp00T</A>,
183 1 equemene
<A HREF="HPL_pdlaswp01N.html">HPL_pdlaswp01N</A>,
184 1 equemene
<A HREF="HPL_pdlaswp01T.html">HPL_pdlaswp01T</A>.
185 1 equemene
186 1 equemene
</BODY>
187 1 equemene
</HTML>