Statistiques
| Révision :

root / man / man3 / HPL_plindx0.3 @ 1

Historique | Voir | Annoter | Télécharger (6,78 ko)

1 1 equemene
.TH HPL_plindx0 3 "September 10, 2008" "HPL 2.0" "HPL Library Functions"
2 1 equemene
.SH NAME
3 1 equemene
HPL_plindx0 \- Compute local swapping index arrays.
4 1 equemene
.SH SYNOPSIS
5 1 equemene
\fB\&#include "hpl.h"\fR
6 1 equemene
7 1 equemene
\fB\&void\fR
8 1 equemene
\fB\&HPL_plindx0(\fR
9 1 equemene
\fB\&HPL_T_panel *\fR
10 1 equemene
\fI\&PANEL\fR,
11 1 equemene
\fB\&const int\fR
12 1 equemene
\fI\&K\fR,
13 1 equemene
\fB\&int *\fR
14 1 equemene
\fI\&IPID\fR,
15 1 equemene
\fB\&int *\fR
16 1 equemene
\fI\&LINDXA\fR,
17 1 equemene
\fB\&int *\fR
18 1 equemene
\fI\&LINDXAU\fR,
19 1 equemene
\fB\&int *\fR
20 1 equemene
\fI\&LLEN\fR
21 1 equemene
\fB\&);\fR
22 1 equemene
.SH DESCRIPTION
23 1 equemene
\fB\&HPL_plindx0\fR
24 1 equemene
computes two local arrays  LINDXA and  LINDXAU  containing
25 1 equemene
the  local  source and final destination position  resulting from the
26 1 equemene
application of row interchanges.
27 1 equemene
28 1 equemene
On entry, the array  IPID  of length K is such that the row of global
29 1 equemene
index  IPID(i)  should be mapped onto row of global index  IPID(i+1).
30 1 equemene
Let  IA  be the global index of the first row to be swapped. For k in
31 1 equemene
[0..K/2), the row of global index IPID(2*k) should be mapped onto the
32 1 equemene
row of global index  IPID(2*k+1).  The question then, is to determine
33 1 equemene
which rows should ultimately be part of U.
34 1 equemene
35 1 equemene
First, some rows of the process ICURROW  may be swapped locally.  One
36 1 equemene
of this row belongs to U, the other one belongs to my local  piece of
37 1 equemene
A.  The other  rows of the current block are swapped with remote rows
38 1 equemene
and are thus not part of U. These rows however should be sent  along,
39 1 equemene
and  grabbed by the other processes  as we  progress in the  exchange
40 1 equemene
phase.
41 1 equemene
42 1 equemene
So, assume that I am  ICURROW  and consider a row of index  IPID(2*i)
43 1 equemene
that I own. If I own IPID(2*i+1) as well and IPID(2*i+1) - IA is less
44 1 equemene
than N,  this row is locally swapped and should be copied into  U  at
45 1 equemene
the position IPID(2*i+1) - IA. No row will be exchanged for this one.
46 1 equemene
If IPID(2*i+1)-IA is greater than N, then the row IPID(2*i) should be
47 1 equemene
locally copied into my local piece of A at the position corresponding
48 1 equemene
to the row of global index IPID(2*i+1).
49 1 equemene
50 1 equemene
If the process  ICURROW does not own  IPID(2*i+1), then row IPID(2*i)
51 1 equemene
is to be swapped away and strictly speaking does not belong to U, but
52 1 equemene
to  A  remotely.  Since this  process will however send this array U,
53 1 equemene
this row is  copied into  U, exactly where the row IPID(2*i+1) should
54 1 equemene
go. For this, we search IPID for k1, such that IPID(2*k1) is equal to
55 1 equemene
IPID(2*i+1); and row  IPID(2*i) is to be copied in U  at the position
56 1 equemene
IPID(2*k1+1)-IA.
57 1 equemene
58 1 equemene
It is thus  important to put the rows that go into U, i.e., such that
59 1 equemene
IPID(2*i+1) - IA is less than N at the begining of the array IPID. By
60 1 equemene
doing so,  U  is formed, and the local copy  is performed in just one
61 1 equemene
sweep.
62 1 equemene
63 1 equemene
Two lists  LINDXA  and  LINDXAU are built.  LINDXA contains the local
64 1 equemene
index of the rows I have that should be copied. LINDXAU  contains the
65 1 equemene
local destination information: if LINDXAU(k) >= 0, row LINDXA(k) of A
66 1 equemene
is to be copied in U at position LINDXAU(k). Otherwise, row LINDXA(k)
67 1 equemene
of A should be locally copied into A(-LINDXAU(k),:).  In the  process
68 1 equemene
ICURROW, the initial packing algorithm proceeds as follows.
69 1 equemene
70 1 equemene
  for all entries in IPID,
71 1 equemene
     if IPID(2*i) is in ICURROW,
72 1 equemene
        if IPID(2*i+1) is in ICURROW,
73 1 equemene
           if( IPID(2*i+1) - IA < N )
74 1 equemene
            save corresponding local position
75 1 equemene
            of this row (LINDXA);
76 1 equemene
            save local position (LINDXAU) in U
77 1 equemene
            where this row goes;
78 1 equemene
            [copy row IPID(2*i) in U at position
79 1 equemene
            IPID(2*i+1)-IA; ];
80 1 equemene
           else
81 1 equemene
            save corresponding local position of
82 1 equemene
            this row (LINDXA);
83 1 equemene
            save local position (-LINDXAU) in A
84 1 equemene
            where this row goes;
85 1 equemene
            [copy row IPID(2*i) in my piece of A
86 1 equemene
            at IPID(2*i+1);]
87 1 equemene
           end if
88 1 equemene
        else
89 1 equemene
           find k1 such that IPID(2*k1) = IPID(2*i+1);
90 1 equemene
           copy row IPID(2*i) in U at position
91 1 equemene
           IPID(2*k1+1)-IA;
92 1 equemene
           save corresponding local position of this
93 1 equemene
           row (LINDXA);
94 1 equemene
           save local position (LINDXAU) in U where
95 1 equemene
           this row goes;
96 1 equemene
        end if
97 1 equemene
     end if
98 1 equemene
  end for
99 1 equemene
100 1 equemene
Second, if I am not the current row process  ICURROW, all source rows
101 1 equemene
in IPID that I own are part of U. Indeed,  they  are swapped with one
102 1 equemene
row  of  the  current  block  of rows,  and  the  main  factorization
103 1 equemene
algorithm proceeds one row after each other.  The processes different
104 1 equemene
from ICURROW,  should  exchange and accumulate  those rows until they
105 1 equemene
receive some data previously owned by the process ICURROW.
106 1 equemene
107 1 equemene
In processes different from  ICURROW,  the  initial packing algorithm
108 1 equemene
proceeds as follows.  Consider a row of global index IPID(2*i) that I
109 1 equemene
own. When I will be receiving data previously owned by ICURROW, i.e.,
110 1 equemene
U, row IPID(2*i) should  replace the row in U at pos. IPID(2*i+1)-IA,
111 1 equemene
and  this particular row of U should be first copied into my piece of
112 1 equemene
A, at A(il,:),  where  il is the  local row  index  corresponding  to
113 1 equemene
IPID(2*i). Now,initially, this row will be packed into workspace, say
114 1 equemene
as the kth row of  that  work array.  The  following  algorithm  sets
115 1 equemene
LINDXAU[k] to IPID(2*i+1)-IA, that is the position in U where the row
116 1 equemene
should be copied. LINDXA(k) stores the local index in  A  where  this
117 1 equemene
row of U should be copied, i.e il.
118 1 equemene
119 1 equemene
  for all entries in IPID,
120 1 equemene
     if IPID(2*i) is not in ICURROW,
121 1 equemene
        copy row IPID(2*i) in work array;
122 1 equemene
        save corresponding local position
123 1 equemene
        of this row (LINDXA);
124 1 equemene
        save position (LINDXAU) in U where
125 1 equemene
        this row should be copied;
126 1 equemene
     end if
127 1 equemene
  end for
128 1 equemene
129 1 equemene
Since we are at it, we also globally figure  out  how many rows every
130 1 equemene
process has. That is necessary, because it would rather be cumbersome
131 1 equemene
to  figure it on  the fly  during the  bi-directional exchange phase.
132 1 equemene
This information is kept in the array  LLEN  of size NPROW. Also note
133 1 equemene
that the arrays LINDXA and LINDXAU are of max length equal to 2*N.
134 1 equemene
.SH ARGUMENTS
135 1 equemene
.TP 8
136 1 equemene
PANEL   (local input/output)    HPL_T_panel *
137 1 equemene
On entry,  PANEL  points to the data structure containing the
138 1 equemene
panel information.
139 1 equemene
.TP 8
140 1 equemene
K       (global input)          const int
141 1 equemene
On entry, K specifies the number of entries in IPID.  K is at
142 1 equemene
least 2*N, and at most 4*N.
143 1 equemene
.TP 8
144 1 equemene
IPID    (global input)          int *
145 1 equemene
On entry,  IPID  is an array of length K. The first K entries
146 1 equemene
of that array contain the src and final destination resulting
147 1 equemene
from the application of the interchanges.
148 1 equemene
.TP 8
149 1 equemene
LINDXA  (local output)          int *
150 1 equemene
On entry, LINDXA  is an array of dimension 2*N. On exit, this
151 1 equemene
array contains the local indexes of the rows of A I have that
152 1 equemene
should be copied into U.
153 1 equemene
.TP 8
154 1 equemene
LINDXAU (local output)          int *
155 1 equemene
On exit, LINDXAU  is an array of dimension 2*N. On exit, this
156 1 equemene
array contains  the local destination  information encoded as
157 1 equemene
follows.  If LINDXAU(k) >= 0, row  LINDXA(k)  of A  is  to be
158 1 equemene
copied in U at position LINDXAU(k).  Otherwise, row LINDXA(k)
159 1 equemene
of A should be locally copied into A(-LINDXAU(k),:).
160 1 equemene
.TP 8
161 1 equemene
LLEN    (global output)         int *
162 1 equemene
On entry,  LLEN  is  an array  of length  NPROW.  On exit, it
163 1 equemene
contains how many rows every process has.
164 1 equemene
.SH SEE ALSO
165 1 equemene
.BR HPL_pdlaswp00N \ (3),
166 1 equemene
.BR HPL_pdlaswp00T \ (3),
167 1 equemene
.BR HPL_pdlaswp01N \ (3),
168 1 equemene
.BR HPL_pdlaswp01T \ (3).