root / www / references.html @ 9
Historique | Voir | Annoter | Télécharger (11,61 ko)
1 |
<HTML>
|
---|---|
2 |
<HEAD>
|
3 |
<TITLE>HPL References</TITLE> |
4 |
</HEAD>
|
5 |
|
6 |
<BODY
|
7 |
BGCOLOR = "WHITE" |
8 |
BACKGROUND = "WHITE" |
9 |
TEXT = "#000000" |
10 |
VLINK = "#000099" |
11 |
ALINK = "#947153" |
12 |
LINK = "#0000ff"> |
13 |
|
14 |
<H2>HPL References</H2> |
15 |
|
16 |
<STRONG>
|
17 |
The list of references below contains some relevant published material |
18 |
to this work. This list is provided for illustrative purposes, and |
19 |
should be regarded as an initial starting point for the interested |
20 |
reader. This list is by all means not meant to be exhaustive. |
21 |
</STRONG><BR><BR> |
22 |
|
23 |
The references have been sorted in four categories and chronologically |
24 |
listed within each category. The four categories are |
25 |
<UL>
|
26 |
<LI><A HREF="references.html#Linpack_Benchmark">Linpack Benchmark</A> |
27 |
<LI><A HREF="references.html#parallel_LUfact">Parallel LU Factorization</A> |
28 |
<LI><A HREF="references.html#recursiv_LUfact">Recursive LU Factorization</A> |
29 |
<LI><A HREF="references.html#parallel_matmul">Parallel Matrix Multiply</A> |
30 |
<LI><A HREF="references.html#parallel_trsolv">Parallel Triangular Solve</A> |
31 |
</UL>
|
32 |
<HR NOSHADE |
33 |
|
34 |
<H3<A ="Linpack_Benchmark">Linpack Benchmark</A></H3> |
35 |
|
36 |
<UL>
|
37 |
|
38 |
<! - 1979 ----------------------------------------------------------- !> |
39 |
<LI><I>LINPACK Users Guide</I>, J. Dongarra, J. Bunch, C. Moler and |
40 |
G. W. Stewart, SIAM, Philadelphia, PA, 1979. |
41 |
|
42 |
<! - 1989 ----------------------------------------------------------- !> |
43 |
<LI><I>Performance of Various Computers Using Standard Linear Equations |
44 |
Software</I>, J. Dongarra, Technical Report CS-89-85, University of
|
45 |
Tennessee, 1989. (An updated version of this report can be found at |
46 |
<A HREF="http://www.netlib.org/benchmark/performance.ps"> |
47 |
http://www.netlib.org/benchmark/performance.ps</A>).
|
48 |
|
49 |
<! - 1991 ----------------------------------------------------------- !> |
50 |
<LI><I>Towards Peak Parallel LINPACK Performance on 400</I>, |
51 |
R. Bisseling and L. Loyens, Supercomputer, Vol. 45, pp. 20-27, 1991. |
52 |
|
53 |
<LI><I>Massively Parallel LINPACK Benchmark on the Intel Touchstone |
54 |
DELTA and iPSC/860 Systems</I>, R. van de Geijn, 1991 Annual Users
|
55 |
Conference Proceedings. Intel Supercomputer Users Group, Dallas, TX, |
56 |
1991. |
57 |
|
58 |
<LI><I>The LINPACK Benchmark on the AP 1000</I>, R. Brent, Frontiers, |
59 |
1992, pp. 128-135, McLean, VA, 1992. |
60 |
|
61 |
<! - 1993 ----------------------------------------------------------- !> |
62 |
<LI><I>Implementation of BLAS Level 3 and LINPACK Benchmark on the |
63 |
AP1000</I>, R. Brent and P. Strazdins, Fujitsu Scientific and Technical
|
64 |
Journal, Vol. 5, No. 1, pp. 61-70, 1993. |
65 |
|
66 |
<! - 1994 ----------------------------------------------------------- !> |
67 |
<LI><I>LU Factorization and the LINPACK Benchmark on the Intel |
68 |
Paragon</I>, D. Womble, D. Greenberg, D. Wheat and S. Riesen, Sandia
|
69 |
Technical Report, 1994. |
70 |
|
71 |
<! - 1995 ----------------------------------------------------------- !> |
72 |
<LI><I>Massively Parallel Distributed Computing: Worlds First 281 |
73 |
Gigaflop Supercomputer</I>, J. Bolen, A. Davis, B. Dazey, S. Gupta,
|
74 |
G. Henry, D. Robboy, G. Schiffler, D. Scott, M. Stallcup, A. Taraghi, |
75 |
S. Wheat from Intel SSD, L. Fisk, G. Istrail, C. Jong, R. Riesen, |
76 |
L. Shuler, from Sandia National Laboratories, Proceedings of the Intel |
77 |
Supercomputer Users Group 1995. |
78 |
|
79 |
<! - 1997 ----------------------------------------------------------- !> |
80 |
<LI><I>High Performance Software on Intel Pentium Pro Processors or |
81 |
Micro-Ops to TeraFLOPS</I>, B. Greer and G. Henry, Proceedings of the
|
82 |
SuperComputing 1997 Conference, ACM SIGARCH - IEEE Computer Society |
83 |
Press - ISBN: 0-89791-985-8, San Jose, CA, 1997. |
84 |
|
85 |
</UL>
|
86 |
<! ------------------------------------------------------------------ !> |
87 |
<HR NOSHADE |
88 |
|
89 |
<H3<A ="parallel_LUfact">Parallel LU Factorization</A></H3> |
90 |
|
91 |
<UL>
|
92 |
|
93 |
<! - 1986 ----------------------------------------------------------- !> |
94 |
<LI><I>Communication Complexity of the Gaussian Elimination Algorithm |
95 |
on Multiprocessors</I>, Y. Saad, Linear Algebra and Its Applications,
|
96 |
Vol. 77, pp. 315-340, 1986. |
97 |
|
98 |
<! - 1988 ----------------------------------------------------------- !> |
99 |
<LI><I>LU Factorization Algorithms on Distributed-Memory Multiprocessor |
100 |
Architectures</I>, G. Geist and C. Romine, SIAM Journal on Scientific
|
101 |
and Statistical Computing, Vol. 9, pp. 639-649, 1988. |
102 |
|
103 |
<! - 1989 ----------------------------------------------------------- !> |
104 |
<LI><I>Parallel LU Decomposition on a Transputer Network</I>, |
105 |
R. Bisseling and J. van der Vorst, Lecture Notes in Computer Sciences, |
106 |
Springer-Verlag, Eds. G. van Zee and J. van der Vorst, Vol. 384, |
107 |
pp. 61-77, 1989. |
108 |
|
109 |
<! - 1990 ----------------------------------------------------------- !> |
110 |
<LI><I>The Distributed Solution of Linear Systems Using the Torus-Wrap |
111 |
Data Mapping</I>, C. Ashcraft, ECA-TR-147, Boeing Computer Services,
|
112 |
Seattle, WA, 1990. |
113 |
|
114 |
<LI><I>Experiments with Multicomputer LU-Decomposition</I>, E. van de |
115 |
Velde, Concurrency: Practice and Experience, Vol. 2, pp. 1-26, 1990. |
116 |
|
117 |
<! - 1991 ----------------------------------------------------------- !> |
118 |
<LI><I>A Taxonomy of Distributed Dense LU Factorization Methods</I>, |
119 |
C. Ashcraft, ECA-TR-161, Boeing Computer Services, Seattle, WA, 1991. |
120 |
|
121 |
<! - 1994 ----------------------------------------------------------- !> |
122 |
<LI><I>The Torus-Wrap Mapping for Dense Matrix Calculations on Massively |
123 |
Parallel Computers</I>, B. Hendrickson and D. Womble, SIAM Journal on
|
124 |
Scientific and Statistical Computing, Vol. 15, pp. 1201-1226, 1994. |
125 |
|
126 |
<LI><I>Scalability Issues in the Design of a Library for Dense Linear |
127 |
Algebra</I>, J. Dongarra, R. van de Geijn and D. Walker, Journal of
|
128 |
Parallel and Distributed Computing, Vol. 22, No. 3, pp. 523-537, 1994. |
129 |
|
130 |
<! - 1995 ----------------------------------------------------------- !> |
131 |
<LI><I>Matrix Factorization using Distributed Panels on the Fujitsu |
132 |
AP1000</I>, P. Strazdins, Proceedings of the IEEE First International
|
133 |
Conference on Algorithms And Architectures for Parallel Processing |
134 |
ICA3PP-95, Brisbane, 1995. |
135 |
|
136 |
<! - 1996 ----------------------------------------------------------- !> |
137 |
<LI><I>The Design and Implementation of the ScaLAPACK LU, QR, and |
138 |
Cholesky Factorization Routines</I>, J. Choi, J. Dongarra, S. Ostrouchov,
|
139 |
A. Petitet, D. Walker and R. C. Whaley, Scientific Programming, Vol. 5, |
140 |
pp. 173-184, 1996. |
141 |
|
142 |
</UL>
|
143 |
<! ------------------------------------------------------------------ !> |
144 |
<HR NOSHADE |
145 |
|
146 |
<H3<A ="recursiv_LUfact">Recursive LU Factorization</A></H3> |
147 |
|
148 |
<UL>
|
149 |
|
150 |
<! - 1997 ----------------------------------------------------------- !> |
151 |
<LI><I>Locality of Reference in LU Decomposition with partial |
152 |
pivoting</I>, S. Toledo, SIAM Journal on Matrix. Anal. Appl., Vol. 18,
|
153 |
No. 4, 1997. |
154 |
|
155 |
<LI><I>Recursion Leads to Automatic Variable Blocking for Dense |
156 |
Linear-Algebra Algorithms</I>, F. Gustavson, IBM Journal of Research
|
157 |
and Development, Vol. 41, No. 6, pp. 737-755, 1997 |
158 |
|
159 |
</UL>
|
160 |
<! ------------------------------------------------------------------ !> |
161 |
<HR NOSHADE |
162 |
|
163 |
<H3<A ="parallel_matmul">Parallel Matrix Multiply</A></H3> |
164 |
|
165 |
<UL>
|
166 |
|
167 |
<! - 1990 ----------------------------------------------------------- !> |
168 |
<LI><I>Matrix Algorithms on a Hypercube I: Matrix Multiplication</I>, |
169 |
G. Fox, S. Otto and A. Hey, Parallel Computing, Vol. 3, pp. 17-31, 1987. |
170 |
|
171 |
<! - 1990 ----------------------------------------------------------- !> |
172 |
<LI><I>Basic Matrix Subprograms for Distributed-Memory Systems</I>, |
173 |
A. Elster, Proceedings of the Fifth Distributed-Memory Computing |
174 |
Conference, Eds. D. Walker and Q. Stout, IEEE Press, pp. 311-316, 1990. |
175 |
|
176 |
<! - 1991 ----------------------------------------------------------- !> |
177 |
<LI><I>The Parallelization of Level 2 and 3 BLAS Operations on |
178 |
Distributed-Memory Machines</I>, M. Aboelaze, N. Chrisochoides
|
179 |
and E. Houstis, CSD-TR-91-007, Purdue University, West Lafayette, |
180 |
IN, 1991. |
181 |
|
182 |
<! - 1992 ----------------------------------------------------------- !> |
183 |
<LI><I>The Multicomputer Toolbox Approach to Concurrent BLAS and LACS</I>, |
184 |
R. Falgout, A. Skjellum, S. Smith and C. Still, Proceedings of the |
185 |
Scalable High Performance Computing Conference SHPCC-92, IEEE Computer |
186 |
Society Press, 1992. |
187 |
|
188 |
<! - 1994 ----------------------------------------------------------- !> |
189 |
<LI><I>A High Performance Matrix Multiplication Algorithm on a |
190 |
Distributed-Memory Parallel Computer, Using Overlapped Communication</I>,
|
191 |
R. Agarwal, F. Gustavson and M. Zubair, IBM Journal or Research and |
192 |
Development, Vol. 38, No. 6, pp. 673-681, 1994. |
193 |
|
194 |
<LI><I>PUMMA: Parallel Universal Matrix Multiplication Algorithms on |
195 |
Distributed-Memory Concurrent Computers</I>, J. Choi, J. Dongarra and
|
196 |
D. Walker, Concurrency: Practice and Experience, Vol. 6, No. 7, |
197 |
pp. 543-570, 1994. |
198 |
|
199 |
<LI><I>Matrix Multiplication on the Intel Touchstone DELTA</I>, |
200 |
S. Huss-Lederman, E. Jacobson, A. Tsao and G. Zhang, Concurrency: |
201 |
Practice and Experience, Vol. 6, No. 7, pp. 571-594, 1994. |
202 |
|
203 |
<! - 1995 ----------------------------------------------------------- !> |
204 |
<LI><I>A Three-Dimensional Approach to Parallel Matrix Multiplication</I>, |
205 |
R. Agarwal, S. Balle, F. Gustavson, M. Joshi and P. Palkar, IBM Journal |
206 |
or Research and Development, Vol. 39, No. 5, pp. 575-582, 1995. |
207 |
|
208 |
<! - 1995 ----------------------------------------------------------- !> |
209 |
<LI><I>A High Performance Parallel Strassen Implementation</I>, |
210 |
B. Grayson and R. van de Geijn, Parallel Processing Letters, Vol. 6, |
211 |
No. 1, pp. 3-12, 1996. |
212 |
|
213 |
<! - 1997 ----------------------------------------------------------- !> |
214 |
<LI><I>Parallel Implementation of BLAS: General Techniques for Level |
215 |
3 BLAS</I>, A. Chtchelkanova, J. Gunnels, G. Morrow, J. Overfelt and
|
216 |
R. van de Geijn, Concurrency: Practice and Experience, Vol. 9, No. 9, |
217 |
pp. 837-857, 1997. |
218 |
|
219 |
<LI><I>A Poly-Algorithm for Parallel Dense Matrix Multiplication on |
220 |
Two-Dimensional Process Grid Topologies</I>, J. Li, R. Falgout and
|
221 |
A. Skjellum, Concurrency: Practice and Experience, Vol. 9, No. 5, |
222 |
pp. 345-389, 1997. |
223 |
|
224 |
<LI><I>SUMMA: Scalable Universal Matrix Multiplication Algorithm</I>, |
225 |
R. van de Geijn and J. Watts, Concurrency: Practice and Experience, |
226 |
Vol. 9, No. 4, pp. 255-274, 1997. |
227 |
|
228 |
</UL>
|
229 |
<! ------------------------------------------------------------------ !> |
230 |
<HR NOSHADE |
231 |
|
232 |
<H3<A ="parallel_trsolv">Parallel Triangular Solve</A></H3> |
233 |
|
234 |
<UL>
|
235 |
|
236 |
<! - 1988 ----------------------------------------------------------- !> |
237 |
<LI><I>Parallel Solution Triangular Systems on Distributed-Memory |
238 |
Multiprocessors</I>, M. Heath and C. Romine, SIAM Journal on Scientific
|
239 |
and Statistical Computing, Vol. 9, pp. 558-588, 1988. |
240 |
|
241 |
<LI><I>A Parallel Triangular Solver for a Distributed-Memory |
242 |
Multiprocessor</I>, G. Li and T. Coleman, SIAM Journal on Scientific
|
243 |
and Statistical Computing, Vol. 9, No. 3, pp. 485-502, 1988. |
244 |
|
245 |
<! - 1989 ----------------------------------------------------------- !> |
246 |
<LI><I>A New Method for Solving Triangular Systems on Distributed-Memory |
247 |
Message-Passing Multiprocessor</I>, G. Li and T. Coleman, SIAM Journal
|
248 |
on Scientific and Statistical Computing, Vol. 10, No. 2, pp. 382-396, |
249 |
1989. |
250 |
|
251 |
<! - 1991 ----------------------------------------------------------- !> |
252 |
<LI><I>Parallel Triangular System Solving on a Mesh Network of |
253 |
Transputers</I>, R. Bisseling and J. van der Vorst, SIAM Journal
|
254 |
on Scientific and Statistical Computing, Vol. 12, pp. 787-799, 1991. |
255 |
|
256 |
</UL>
|
257 |
<! ------------------------------------------------------------------ !> |
258 |
|
259 |
<HR NOSHADE |
260 |
<CENTER |
261 |
<A = "index.html"> [Home]</A> |
262 |
<A HREF = "copyright.html"> [Copyright and Licensing Terms]</A> |
263 |
<A HREF = "algorithm.html"> [Algorithm]</A> |
264 |
<A HREF = "scalability.html"> [Scalability]</A> |
265 |
<A HREF = "results.html"> [Performance Results]</A> |
266 |
<A HREF = "documentation.html"> [Documentation]</A> |
267 |
<A HREF = "software.html"> [Software]</A> |
268 |
<A HREF = "faqs.html"> [FAQs]</A> |
269 |
<A HREF = "tuning.html"> [Tuning]</A> |
270 |
<A HREF = "errata.html"> [Errata-Bugs]</A> |
271 |
<A HREF = "references.html"> [References]</A> |
272 |
<A HREF = "links.html"> [Related Links]</A><BR> |
273 |
</CENTER>
|
274 |
<HR NOSHADE |
275 |
</BODY |
276 |
</HTML |