Vai ai contenuti. | Spostati sulla navigazione | Spostati sulla ricerca | Vai al menu | Contatti | Accessibilità

| Crea un account

Silvestri, Francesco (2009) Oblivious Computations on Memory and Network Hierarchies. [Tesi di dottorato]

Full text disponibile come:

[img]
Anteprima
Documento PDF
1069Kb

Abstract (inglese)

The hierarchical organization of the memory and communication systems and the availability of numerous processing units play an important role in the performance of algorithms. Their actual exploitation is made hard by the different configurations they may assume. It is crucial, for economical and portability issues, that algorithms adapt to a wide spectrum of executing platforms, possibly in an automatic fashion. Adaptivity can be achieved through either aware algorithms, which make explicit use of suitable architectural parameters, or oblivious algorithms, whose sequence of operations is independent of the characteristics of the underlying architecture. Oblivious algorithms are more desirable in contexts where architectural parameters are unknown and hard to estimate, and, in some cases, they still exhibit optimal performance across different architectures.
This thesis focuses on the study of oblivious algorithms pursuing two main objectives: the investigation of the potentialities and intrinsic limitations of oblivious computations in comparison with aware ones, and the introduction of the notion of oblivious algorithm in the parallel setting.
We study various aspects concerning the execution of cache-oblivious algorithms for rational permutations, an important class of permutations including matrix transposition and the bit-reversal of a vector. We provide a lower bound, which is also valid for cache-aware algorithms, on the work complexity required by the execution of a rational permutation with an optimal cache complexity. Then, we develop a cache-oblivious algorithm performing any rational permutation, which exhibits optimal work and cache complexities under the tall-cache assumption. We then show that for certain families of rational permutations (including transposition and bit-reversal) no cache-oblivious algorithm can exhibit optimal cache complexity for all values of the cache parameters, while there exists a cache-aware algorithm with this property.
We introduce the network-oblivious framework for the study of oblivious algorithms in the parallel setting. This framework explores the design of bulk-synchronous parallel algorithms that, without resorting to parameters for tuning the performance on the target platform, can execute efficiently on parallel machines with different degree of parallelism and bandwidth/latency characteristics. We illustrate the framework by providing optimal network-oblivious algorithms for a few key problems (i.e., matrix multiplication and transposition, discrete Fourier transform and sorting) and by presenting an impossibility result on the execution of network-oblivious algorithms for matrix transposition, which is similar, in spirit, to the one provided for rational permutations.
Finally, we present a number of network-oblivious algorithms, which exhibit optimal communication and computation complexities, for solving a wide class of computations encompassed by the Gaussian Elimination Paradigm, including all-pairs shortest paths, Gaussian elimination without pivoting and matrix multiplication.

Abstract (italiano)

L'organizzazione gerarchica dei sistemi di memoria e di comunicazione e la disponibilità di molte unità di calcolo influiscono notevolmente sulle prestazioni di un algoritmo. Il loro efficiente utilizzo è limitato dalle differenti configurazioni che possono assumere. È cruciale, per motivi economici e di portabilità, che gli algoritmi si adattino allo spettro delle piattaforme esistenti e che la procedura di adattamento sia il più possibile automatizzata. L'adattività può essere raggiunta sia tramite algoritmi aware, che utilizzano esplicitamente opportuni parametri architetturali, sia tramite algoritmi oblivious, la cui sequenza di operazioni è indipendente dalle caratteristiche dell'architettura sottostante. Gli algoritmi oblivious esibiscono spesso prestazioni ottime su diverse architetture e sono attrattivi soprattutto nei contesti in cui i parametri architetturali sono difficili da stimare o non sono noti.
Questa tesi si focalizza sullo studio degli algoritmi oblivious con due obiettivi principali: l'indagine delle potenzialità e limitazioni delle computazioni oblivious e l'introduzione del concetto di algoritmo oblivious in un sistema parallelo.
Inizialmente, vengono affrontate varie problematiche legate all'esecuzione di algoritmi cache-oblivious per permutazioni razionali, le quali rappresentano un'importante classe di permutazioni che include la trasposizione di matrici e il bit-reversal di vettori. Si dimostra un lower bound, valido anche per algoritmi cache-aware, sul numero di operazioni macchina necessarie per eseguire una permutazione razionale assumendo un numero ottimo di accessi alla memoria. Quindi, si descrive un algoritmo cache-oblivious che esegue qualsiasi permutazione razionale e che richiede un numero ottimo di operazioni macchina e di accessi alla memoria, assumendo l'ipotesi di tall-cache. Infine, si dimostra che per certe famiglie di permutazioni razionali (tra cui la trasposizione e il bit-reversal) non può esistere un algoritmo cache-oblivious che richieda un numero ottimo di accessi alla memoria per tutti i valori dei parametri della cache, mentre esiste un algoritmo cache-aware con tali caratteristiche.
Nella tesi viene poi introdotto il framework network-oblivious per lo studio di algoritmi oblivious in un sistema parallelo. Il framework esplora lo sviluppo di algoritmi paralleli di tipo bulk-synchronous che, senza usare parametri dipendenti dalla macchina, hanno prestazioni efficienti su macchine parallele con differenti gradi di parallelismo e valori di banda/latenza. Vengono inoltre forniti algoritmi network-oblivious per alcuni problemi chiave (moltiplicazione e trasposizione di matrici, trasformata discreta di Fourier, ordinamento) e viene presentato un risultato di impossibilità sull'esecuzione di algoritmi network-oblivious per la trasposizione di matrici che ricorda quello ottenuto per le permutazioni razionali.
Infine, per mostrare ulteriormente le potenzialità del framework, vengono presentati algoritmi network-oblivious ottimi per eseguire un'ampia classe di computazioni risolvibili tramite il paradigma di programmazione ad eliminazione gaussiana, tra cui il calcolo dei cammini minimi in un grafo, l'eliminazione gaussiana senza pivoting e la moltiplicazione di matrici.

Statistiche Download - Aggiungi a RefWorks
Tipo di EPrint:Tesi di dottorato
Relatore:Pietracaprina, Andrea Alberto
Dottorato (corsi e scuole):Ciclo 21 > Scuole per il 21simo ciclo > INGEGNERIA DELL'INFORMAZIONE > INGEGNERIA INFORMATICA ED ELETTRONICA INDUSTRIALI
Data di deposito della tesi:28 Gennaio 2009
Anno di Pubblicazione:28 Gennaio 2009
Parole chiave (italiano / inglese):cache-oblivious algorithm, network-oblivious algorithm, network-oblivious framework, parallel computing, memory hierarchy, model, algorithm, impossibility results, ideal cache, decomposable bsp, Gaussian elimination paradigm
Settori scientifico-disciplinari MIUR:Area 01 - Scienze matematiche e informatiche > INF/01 Informatica
Struttura di riferimento:Dipartimenti > Dipartimento di Ingegneria dell'Informazione
Codice ID:1595
Depositato il:28 Gen 2009
Simple Metadata
Full Metadata
EndNote Format

Bibliografia

I riferimenti della bibliografia possono essere cercati con Cerca la citazione di AIRE, copiando il titolo dell'articolo (o del libro) e la rivista (se presente) nei campi appositi di "Cerca la Citazione di AIRE".
Le url contenute in alcuni riferimenti sono raggiungibili cliccando sul link alla fine della citazione (Vai!) e tramite Google (Ricerca con Google). Il risultato dipende dalla formattazione della citazione.

Alok Aggarwal, Bowen Alpern, Ashok K. Chandra, and Marc Snir. Cerca con Google

A model for hierarchical memory. Cerca con Google

In Proceedings of the 19th ACM Symposium on Theory of Computing, pages 305-314, 1987. Cerca con Google

Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, and Katherine A. Yelick. Cerca con Google

The landscape of parallel computing research: A view from Berkeley. Cerca con Google

Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, December 2006. Cerca con Google

Lars Arge, Gerth Stølting Brodal, and Rolf Fagerberg. Cerca con Google

Cache-oblivious data structures. Cerca con Google

In Dinesh P. Mehta and Sartaj Sahni, editors, Handbook of Data Structures and Applications, chapter 34, page 27. CRC Press, 2005. Cerca con Google

Bowen Alpern, Larry Carter, Ephraim Feig, and Ted Selker. Cerca con Google

The uniform memory hierarchy model of computation. Cerca con Google

Algorithmica, 12(2/3):72-109, 1994. Cerca con Google

Alok Aggarwal, Ashok K. Chandra, and Marc Snir. Cerca con Google

Hierarchical memory with block transfer. Cerca con Google

In Proceedings of the 28th IEEE Symposium on Foundations of Computer Science, pages 204-216, 1987. Cerca con Google

Alok Aggarwal, Ashok K. Chandra, and Marc Snir. Cerca con Google

Communication complexity of PRAMs. Cerca con Google

Theoretical Computer Science, 71:3-28, 1990. Cerca con Google

Narasimha R. Adiga and et al. Cerca con Google

An overview of the BlueGene/L supercomputer. Cerca con Google

In Proceedings of the ACM/IEEE Conference on Supercomputing, pages 1-22, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press. Cerca con Google

AEOLUS project website http://aeolus.ceid.upatras.gr. Vai! Cerca con Google

Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. Cerca con Google

The Design and Analysis of Computer Algorithms. Cerca con Google

Addison-Wesley, 1974. Cerca con Google

Lars Arge. Cerca con Google

External geometric data structures. Cerca con Google

In Kyung-Yong Chwa and J. Ian Munro, editors, Proceedings of the 10th Annual International Conference of Computing and Combinatorics, volume 3106 of Lecture Notes in Computer Science. Springer, 2004. Cerca con Google

Arvind. Cerca con Google

Data flow languages and architecture. Cerca con Google

In Proceedings of the 8th Annual Symposium on Computer Architecture, 1981. Cerca con Google

Alok Aggarwal and Jeffrey Scott Vitter. Cerca con Google

The input/output complexity of sorting and related problems. Cerca con Google

Communications of the ACM, 31(9):1116-1127, 1988. Cerca con Google

John K. Backus. Cerca con Google

Can programming be liberated from the Von Neumann style? A functional style and its algebra of programs. Cerca con Google

Communication of the ACM, 21(8):613-641, 1978. Cerca con Google

Sandeep N. Bhatt, Gianfranco Bilardi, and Geppino Pucci. Cerca con Google

Area-universal circuits with constant slowdown. Cerca con Google

In Proceedings of the 18th International Conference on Advanced Research in VLSI, pages 89-98, 1999. Cerca con Google

Armin Bäumker, Wolfgang Dittrich, and Friedhelm Meyer auf der Heide. Cerca con Google

Truly efficient parallel algorithms: 1-optimal multisearch for an extension of the BSP model. Cerca con Google

In Selected papers from the 3rd European Symposium on Algorithms, pages 175-203, Amsterdam, The Netherlands, 1998. Elsevier Science Publishers B. V. Cerca con Google

Armin Bäumker, Wolfgang Dittrich, and Andrea Pietracaprina. Cerca con Google

The complexity of parallel multisearch on coarse-grained machines. Cerca con Google

Algorithmica, 24(3-4):209-242, 1999. Cerca con Google

Laszlo A. Belady. Cerca con Google

A study of replacement algorithms for a virtual storage computer. Cerca con Google

IBM Systems Journal, 5(2):78-101, 1966. Cerca con Google

Alberto Bertoldo. Cerca con Google

An adaptive parallel solver for finite-elements applications. Cerca con Google

PhD thesis, Department of Information Engineering, University of Padova, 2008. Cerca con Google

Gerth Stølting Brodal and Rolf Fagerberg. Cerca con Google

On the limits of cache-obliviousness. Cerca con Google

In Proceedings of the 35th ACM Symposium on Theory of Computing, pages 307-315, June 2003. Cerca con Google

Michael A. Bender, Jeremy T. Fineman, Seth Gilbert, and Bradley C. Kuszmaul. Cerca con Google

Concurrent cache-oblivious B-trees. Cerca con Google

In Proceedings of the 17th ACM Symposium on Parallelism in Algorithms and Architectures, pages 228-237, New York, NY, USA, 2005. ACM. Cerca con Google

Gianfranco Bilardi, Carlo Fantozzi, Andrea Pietracaprina, and Geppino Pucci. Cerca con Google

On the effectiveness of D-BSP as a bridging model of parallel computation. Cerca con Google

In Proceedings of the International Conference on Computational Science-Part II, volume 2074 of Lecture Notes in Computer Science, pages 579-588, London, UK, 2001. Springer-Verlag. Cerca con Google

Gianfranco Bilardi and Franco Preparata. Cerca con Google

Processor-time tradeoffs under bounded-speed message propagation: Part I, upper bounds. Cerca con Google

Theory of Computing Systems, 30:523-546, 1997. Cerca con Google

Gianfranco Bilardi and Franco Preparata. Cerca con Google

Processor-time tradeoffs under bounded-speed message propagation: Part II, lower bounds. Cerca con Google

Theory of Computing Systems, 32:531-559, 1999. Cerca con Google

Gianfranco Bilardi and Enoch Peserico. Cerca con Google

A characterization of temporal locality and its portability across memory hierarchies. Cerca con Google

In Proceedings of the 28th International Colloquium on Automata, Languages and Programming, volume 2076 of Lecture Notes in Computer Science, pages 128-139, 2001. Cerca con Google

Gianfranco Bilardi, Andrea Pietracaprina, and Geppino Pucci. Cerca con Google

A quantitative measure of portability with application to bandwidth-latency models for parallel computing. Cerca con Google

In Proceedings of the 5th International Euro-Par Conference on Parallel Processing, volume 1685 of Lecture Notes in Computer Science, pages 543-551, September 1999. Cerca con Google

Gianfranco Bilardi, Andrea Pietracaprina, Geppino Pucci, Fabio Schifano, and Raffaele Tripiccione. Cerca con Google

The potential of on-chip multiprocessing for QCD machines. Cerca con Google

In Proceedings of the 12th International Conference on High-Performance Computing, volume 3769 of Lecture Notes in Computer Science, pages 386-397, 2005. Cerca con Google

Gianfranco Bilardi, Andrea Pietracaprina, and Geppino Pucci. Cerca con Google

Decomposable BSP: A bandwidth-latency model for parallel and hierarchical computation. Cerca con Google

In John Reif and Sanguthevar Rajasekaran, editors, Handbook of Parallel Computing: Models, Algorithms and Applications, pages 277-315. CRC Press, 2007. Cerca con Google

Gianfranco Bilardi, Andrea Pietracaprina, Geppino Pucci, and Francesco Silvestri. Cerca con Google

Network-oblivious algorithms. Cerca con Google

In Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium, March 2007. Cerca con Google

Thomas Cheatham, Amr F. Fahmy, Dan C. Stefanescu, and Leslie G. Valiant. Cerca con Google

Bulk synchronous parallel computing - a paradigm for transportable software. Cerca con Google

In Proceedings of the 28th Hawaii International Conference on System Sciences, page 268, Washington, DC, USA, 1995. IEEE Computer Society. Cerca con Google

Larry Carter and Kang Su Gatlin. Cerca con Google

Towards an optimal bit-reversal permutation program. Cerca con Google

In Proceedings of the 39th IEEE Symposium on Foundations of Computer Science, pages 544-555, 1998. Cerca con Google

Rezaul A. Chowdhury. Cerca con Google

Cache-efficient Algorithms and Data Structures: Theory and Experimental Evaluation. Cerca con Google

PhD thesis, Department of Computer Sciences, The University of Texas at Austin, August 2007. Cerca con Google

David E. Culler, Richard M. Karp, David A. Patterson, Abhijit Sahay, Eunice E. Santos, Klaus E. Schauser, Ramesh Subramonian, and Thorsten von Eicken. Cerca con Google

LogP: A practical model of parallel computation. Cerca con Google

Communications of the ACM, 39(11):78-85, November 1996. Cerca con Google

Siddhartha Chatterjee, Alvin R. Lebeck, Praveen K. Patnala, and Mithuna Thottethodi. Cerca con Google

Recursive array layouts and fast parallel matrix multiplication. Cerca con Google

In Proceedings of the 11th ACM Symposium on Parallel Algorithms and Architectures, pages 222-231, 1999. Cerca con Google

Siddhartha Chatterjee, Alvin R. Lebeck, Praveen K. Patnala, and Mithuna Thottethodi. Cerca con Google

Recursive array layouts and fast matrix multiplication. Cerca con Google

IEEE Transactions on Parallel and Distributed Systems, 13(11):1105-1123, November 2002. Cerca con Google

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Cerca con Google

Introduction to Algorithms. Cerca con Google

The MIT Press, Cambridge, MA, USA, second edition, September 2001. Cerca con Google

Thomas H. Cormen. Cerca con Google

Fast permuting on disk arrays. Cerca con Google

Journal of Parallel and Distributed Computing, 17(1-2), 1993. Cerca con Google

Thomas H. Cormen. Cerca con Google

Virtual Memory for Data-Parallel Computing. Cerca con Google

PhD thesis, Massachussetts Institute of Technology, 1993. Cerca con Google

Rezaul A. Chowdhury and Vijaya Ramachandran. Cerca con Google

Cache-oblivious dynamic programming. Cerca con Google

In Proceedings of the 17th ACM-SIAM Symposium on Discrete Algorithm, pages 591-600, 2006. Cerca con Google

Rezaul A. Chowdhury and Vijaya Ramachandran. Cerca con Google

The cache-oblivious Gaussian elimination paradigm: theoretical framework, parallelization and experimental evaluation. Cerca con Google

In Proceedings of the 19th annual Symposium on Parallelism in Algorithms and Architectures, pages 71-80, New York, NY, USA, 2007. ACM. Cerca con Google

Rezaul A. Chowdhury and Vijaya Ramachandran. Cerca con Google

Cache-efficient dynamic programming algorithms for multicores. Cerca con Google

In Proceedings of the 20th Symposium on Parallelism in Algorithms and Architectures, pages 207-216, New York, NY, USA, 2008. ACM. Cerca con Google

David Culler, J. P. Singh, and Anoop Gupta. Cerca con Google

Parallel Computer Architecture: A Hardware/Software Approach. Cerca con Google

Morgan Kaufmann, August 1998. Cerca con Google

Howard B. Demuth. Cerca con Google

Electronic Data Sorting. Cerca con Google

PhD thesis, Stanford University, October 1956. Cerca con Google

Howard B. Demuth. Cerca con Google

Electronic data sorting. Cerca con Google

IEEE Transactions on Computers, 34(4):296-310, 1985. Cerca con Google

Erik Demaine. Cerca con Google

Cache-oblivious algorithms and data structures. Cerca con Google

Lecture Notes from the EEF Summer School on Massive Data Sets, 2002. Cerca con Google

Frank Dehne, David Hutchinson, Anil Maheshwari, and Wolfgang Dittrich. Cerca con Google

Reducing I/O complexity by simulating coarse grained parallel algorithms. Cerca con Google

In Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing, pages 14-20, Washington, DC, USA, 1999. IEEE Computer Society. Cerca con Google

Pilar De la Torre and Clyde P. Kruskal. Cerca con Google

A structural theory of recursively decomposable parallel processor-networks. Cerca con Google

In Proceedings of the IEEE Symposium on Parallel and Distributeed Processing, page 570, Washington, DC, USA, 1995. IEEE Computer Society. Cerca con Google

Pilar De la Torre and Clyde P. Kruskal. Cerca con Google

Submachine locality in the bulk synchronous setting. Cerca con Google

In Proceedings of the 2nd International Euro-Par Conference on Parallel Processing-Volume II, volume 1124 of Lecture Notes in Computer Science, pages 352-358, August 1996. Cerca con Google

Carlo Fantozzi. Cerca con Google

A Computational Model for Parallel and Hierarchical Machines. Cerca con Google

PhD thesis, Department of Information Engineering, University of Padova, 2003. Cerca con Google

FIPS PUB 46-3. Cerca con Google

Data Encryption Standard (DES). Cerca con Google

National Institute for Standards and Technology, Gaithersburg, MD, USA, October 1999. Cerca con Google

Matteo Frigo and Steven G. Johnson. Cerca con Google

FFTW: an adaptive software architecture for the FFT. Cerca con Google

Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 3:1381-1384 vol.3, May 1998. Cerca con Google

Ian Foster and Carl Kesselman. Cerca con Google

The Grid 2: Blueprint for a New Computing Infrastructure. Cerca con Google

Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2003. Cerca con Google

Richard W. Floyd. Cerca con Google

Permuting information in idealized two-level storage. Cerca con Google

In R. Miller and J. W. Thatche, editors, Complexity of Computer Computations, pages 105-109. Plenum, 1972. Cerca con Google

Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cerca con Google

Cache-oblivious algorithms. Cerca con Google

In Proceedings of the 40th IEEE Symposium on Foundations of Computer Science, pages 285-298, 1999. Cerca con Google

Carlo Fantozzi, Andrea Pietracaprina, and Geppino Pucci. Cerca con Google

A general Pram simulation scheme for clustered machines. Cerca con Google

International Journal on Foundations of Computer Science, 14(6):1147-1164, 2003. Cerca con Google

Carlo Fantozzi, Andrea Pietracaprina, and Geppino Pucci. Cerca con Google

Translating submachine locality into locality of reference. Cerca con Google

Journal of Parallel and Distributed Computing, 66(5):633-646, 2006. Cerca con Google

Matteo Frigo. Cerca con Google

Portable High-Performance Programs. Cerca con Google

PhD thesis, Massachusetts Institute of Technology, June 1999. Cerca con Google

Matteo Frigo. Cerca con Google

Personal communication, 2008. Cerca con Google

Matteo Frigo and Volker Strumpen. Cerca con Google

The cache complexity of multithreaded cache oblivious algorithms. Cerca con Google

In Proceedings of the 18th ACM Symposium on Parallelism in Algorithms and Architectures, pages 271-280, New York, NY, USA, 2006. ACM. Cerca con Google

Steven Fortune and James Wyllie. Cerca con Google

Parallelism in random access machines. Cerca con Google

In Proceedings of the 10th ACM Symposium on Theory of Computing, pages 114-118, New York, NY, USA, 1978. ACM. Cerca con Google

Phillip B. Gibbons, Y. Matias, and Vijaya Ramachandran. Cerca con Google

Can a shared-memory model serve as a bridging-model for parallel computation? Cerca con Google

Theory of Computing Systems, 32(3):327-359, 1999. Cerca con Google

Leslie M. Goldschlager. Cerca con Google

A unified approach to models of synchronous parallel machines. Cerca con Google

In Proceedings of the 10th annual ACM symposium on Theory of computing, pages 89-94, New York, NY, USA, 1978. ACM. Cerca con Google

Michael T. Goodrich. Cerca con Google

Communication-efficient parallel sorting. Cerca con Google

SIAM Journal on Computing, 29(2):416-432, 1999. Cerca con Google

Gene H. Golub and Charles F. Van Loan. Cerca con Google

Matrix Computations. Cerca con Google

The Johns Hopkins University Press, October 1996. Cerca con Google

Garth A. Gibson, Jeffrey Scott Vitter, and John Wilkes. Cerca con Google

Strategic directions in storage I/O issues in large-scale computing. Cerca con Google

ACM Computer Survey, 28(4):779-793, 1996. Cerca con Google

Jia-Wei Hong and H. T. Kung. Cerca con Google

I/O complexity: The red-blue pebble game. Cerca con Google

In Proceedings of the 13th ACM Symposium on Theory of Computing, pages 326-333, New York, NY, USA, 1981. ACM. Cerca con Google

John L. Hennessy and David A. Patterson. Cerca con Google

Computer Architecture, Fourth Edition: A Quantitative Approach. Cerca con Google

Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2006. Cerca con Google

Dror Irony, Sivan Toledo, and Alexander Tiskin. Cerca con Google

Communication lower bounds for distributed-memory matrix multiplication. Cerca con Google

Journal of Parallel and Distributed Computing, 64(9):1017-1026, 2004. Cerca con Google

Joseph JáJá. Cerca con Google

An Introduction to Parallel Algorithms. Cerca con Google

Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA, 1992. Cerca con Google

Ken Kennedy and John R. Allen. Cerca con Google

Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Cerca con Google

Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002. Cerca con Google

Leslie Robert Kerr. Cerca con Google

The effect of algebraic structure on the computational complexity of matrix multiplication. Cerca con Google

PhD thesis, Cornell University, 1970. Cerca con Google

Donald E. Knuth. Cerca con Google

Art of Computer Programming, Volume 3: Sorting and Searching. Cerca con Google

Addison-Wesley Professional, second edition, April 1998. Cerca con Google

Richard M. Karp and Vijaya Ramachandran. Cerca con Google

Parallel algorithms for shared-memory machines. Cerca con Google

In Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity, pages 869-942. MIT Press, 1990. Cerca con Google

Piyush Kumar. Cerca con Google

Cache oblivious algorithms. Cerca con Google

In Meyer et al. [MSS03], pages 193-212. Cerca con Google

Charles E. Leiserson. Cerca con Google

Fat-trees: universal networks for hardware-efficient supercomputing. Cerca con Google

IEEE Transactions on Computers, 34(10):892-901, October 1985. Cerca con Google

Frank Thomson Leighton. Cerca con Google

Introduction to Parallel Algorithms and Architectures: Arrays $ \bullet$ Trees $ \bullet$ Hypercubes. Cerca con Google

Kaufmann, 1992. Cerca con Google

Paul S. Lewis and Sun-Yuan Kung. Cerca con Google

An optimal systolic array for the algebraic path problem. Cerca con Google

IEEE Transactions on Computers, 40(1):100-105, 1991. Cerca con Google

Marjan Merenik, Jan Heering, and Anthony M. Sloane. Cerca con Google

When and how to develop domain-specific languages. Cerca con Google

ACM Computer Surveys, 37(4):316-344, 2005. Cerca con Google

Kurt Mehlhorn and Peter Sanders. Cerca con Google

Scanning multiple sequences via cache memory. Cerca con Google

Algorithmica, 35:2003, 2000. Cerca con Google

Ulrich Meyer, Peter Sanders, and Jop F. Sibeyn, editors. Cerca con Google

Algorithms for Memory Hierarchies, Advanced Lectures (Dagstuhl Research Seminar, March 10-14, 2002), volume 2625 of Lecture Notes in Computer Science. Springer, 2003. Cerca con Google

Edward Neuman. Cerca con Google

Inequalities involving multivariate convex functions ii. Cerca con Google

Proceedings of the American Mathematical Society, 109(4):965-974, 1990. Cerca con Google

Kunle Olukotun, Basem A. Nayfeh, Lance Hammond, Ken Wilson, and Kunyung Chang. Cerca con Google

The case for a single-chip multiprocessor. Cerca con Google

In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 2-11. ACM Press, 1996. Cerca con Google

Andrea Pietracaprina. Cerca con Google

Lower bound for BSPC matrix multiplication. Cerca con Google

Manuscript, 1995. Cerca con Google

Andrea Pietracaprina, Geppino Pucci, and Francesco Silvestri. Cerca con Google

Cache-oblivious simulation of parallel programs. Cerca con Google

In Proceedings of the 8th Workshop on Advances in Parallel and Distributed Computational Models, April 2006. Cerca con Google

Naila Rahman. Cerca con Google

Algorithms for hardware caches and TLB. Cerca con Google

In Meyer et al. [MSS03], pages 171-192. Cerca con Google

Sandeep Sen, Siddhartha Chatterjee, and Neeraj Dumir. Cerca con Google

Towards a theory of cache-efficient algorithms. Cerca con Google

Journal of the ACM, 49(6):828-858, 2002. Cerca con Google

Francesco Silvestri. Cerca con Google

Simulazione di algoritmi paralleli per il modello D-BSP su una gerarchia di cache ideali. Cerca con Google

Master's thesis, Department of Information Engineering, University of Padova, Italy, October 2005. Cerca con Google

Laurea Thesis (in italian). Cerca con Google

Francesco Silvestri. Cerca con Google

On the limits of cache-oblivious matrix transposition. Cerca con Google

In Ugo Montanari, Donald Sannella, and Roberto Bruni, editors, Proceedings of the 2nd Symposium of Trustworthy Global Computing, volume 4661 of Lecture Notes in Computer Science, pages 233-243. Springer, 2006. Cerca con Google

Francesco Silvestri. Cerca con Google

On the limits of cache-oblivious rational permutations. Cerca con Google

Theoretical Computer Science, 402(2-3):221-233, 2008. Cerca con Google

Jop F. Sibeyn and Michael Kaufmann. Cerca con Google

BSP-like external-memory computation. Cerca con Google

In Proceedings of the 3rd Italian Conference on Algorithms and Complexity, pages 229-240, London, UK, 1997. Springer-Verlag. Cerca con Google

Elizabeth A.M. Shriver and Mark Nodine. Cerca con Google

An introduction to parallel I/O models and algorithms. Cerca con Google

In Input/output in parallel and distributed computer systems, pages 31-68. Kluwer Academic Publishers, 1996. Cerca con Google

John E. Savage and Jeffrey Scott Vitter. Cerca con Google

Parallelism in space-time tradeoffs. Cerca con Google

In Advances in Computing Research, volume 4, pages 117-146. North-Holland, 1987. Cerca con Google

Tokutek website: www.tokutek.com. Vai! Cerca con Google

Leslie G. Valiant. Cerca con Google

A bridging model for parallel computation. Cerca con Google

Communications of the ACM, 33(8):103-111, August 1990. Cerca con Google

Jeffrey Scott Vitter. Cerca con Google

External memory algorithms and data structures: dealing with massive data. Cerca con Google

ACM Computing Surveys, 33(2):209-271, 2001. Cerca con Google

Jeffrey Scott Vitter and Elizabeth A.M. Shriver. Cerca con Google

Algorithms for parallel memory I: Two-level memories. Cerca con Google

Algorithmica, 12(2/3):110-147, 1994. Cerca con Google

Michael Wolfe. Cerca con Google

Parallelizing compilers. Cerca con Google

ACM Computer Surveys, 28(1):261-262, 1996. Cerca con Google

R. Clinton Whaley, Antoine Petitet, and Jack Dongarra. Cerca con Google

Automated empirical optimizations of software and the ATLAS project. Cerca con Google

Parallel Computing, 27(1-2):3-35, 2001. Cerca con Google

Kamen Yotov, Thomas Roeder, Keshav Pingali, John A. Gunnels, and Fred G. Gustavson. Cerca con Google

An experimental comparison of cache-oblivious and cache-conscious programs. Cerca con Google

In Proceedings of the 19th ACM Symposium on Parallel Algorithms and Architectures, 2007. Cerca con Google

Download statistics

Solo per lo Staff dell Archivio: Modifica questo record