SIAM Short Course on Computer Architecture for Mathematicians and Numerical Analysts
Date: Sunday, July 10, 2005
The traditional measures of computational complexity, such as the number of floating-point operations, are poor guides to the realized performance of algorithms on modern computers. Developing efficient algorithms requires an in-depth knowledge of current parallel architectures for high-performance computing, including structures, mechanisms, and operational methods used to deliver sustained performance.
This proposed SIAM full-day tutorial will present a comprehensive description of modern parallel computer architectures, their performance-determining characteristics and metrics, and both hardware and software methods for achieving superior performance through parallel architecture. From this tutorial, the anticipated participant will acquire a good understanding of the state of the art in high-performance computer system architectures and the impact that computer architecture has on algorithm design and programming.
In addition, attendees will be exposed to software methods and practices for addressing critical challenges to the practical application of such systems. Moreover, likely directions for future-high performance computing will be discussed, illustrated with innovative concepts being considered by the high-end computing industry.
To enable mathematicians and numerical analysts to understand the impact of architectural design on performance and to learn how to exploit current and next generation high performance parallel computers in order to achieve high performance.
Describe principal high-performance computer classes, characteristics, and examples.
1. Identify metrics for performance evaluation, sources of performance sensitive properties of parallel systems, and strategies for addressing them.
2. Discuss software practices and techniques for high-performance numerical application algorithms,
3. Describe architecture structures for enhanced capability and future systems likely to incorporate such structures.
Level of Material
80 % beginner, 20% intermediate
Anyone with an interest in high performance computing, particularly those developing algorithms and/or applications for high-performance computing.
Basic numerical analysis and some familiarity with programming in languages such as Fortran, C, or Java.
Introduction (1 hour)
- Motivation for the tutorial
- Brief history of HEC systems
- Overview of major system classes
- Challenges to delivering sustained performance
- Ways to attack the problems
- Tutorial overview
Uniprocessor architecture (1 hour)
- Principal components and structures
- Dominant metrics and performance models
- Contributing factors to performance degradation
Parallel HPC architecture (1 hour)
- Ideal parallel architectures and execution models
- Real-world parallel architecture classes
- Today's parallel architectures
HPC sources of performance degradation (1 hour)
- Basic issues
- Specific architecture problems
- Architecture structures and mechanisms for efficiency
Algorithmic and software strategies for performance optimization (1.5 hours)
- Exposing parallelism
- Mastering synchronization and communication
- Managing locality and memory access
- Load balancing
Future directions in parallel architecture (0.5 hours)
- Systolic and reconfigurable
- Message-driven split-transaction
- Processor in memory (PIM)
References for further reading:
“Bit Reversal on Uniprocessors,” Alan Karp, SIAM Review, 38, #1, pages 1-26, 1996.
Computer Architecture: A Quantitative Approach, Hennessy and Patterson, Morgan Kaufman, 3rd edition, 2002.
Parallel Computer Architecture: A Hardware/Software Approach, Culler, Singh, and Gupta, Morgan Kaufman, 1999.
“Parallel Computer Architectures,” William Gropp, in Sourcebook of Parallel Computing, Jack Dongarra, Ian Foster, Geoffrey Fox, William Gropp, Ken Kennedy, Linda Torczon, and Andy White, editors. Pages 15–42. Morgan Kaufmann, 2003.
Enabling Technologies for Petaflops Computing, Thomas Sterling, Paul Messina, and Paul H. Smith, MIT Press 1995.
Dr. Thomas Sterling is a principal scientist at the NASA Jet Propulsion Laboratory and faculty associate at the California Institute of Technology. He is a recognized leader in the field of innovative high-performance computer architecture.
Since receiving his Ph.D. from MIT as a Hertz fellow two decades ago, he has conducted extensive research in advanced parallel computer structures and computational models. In 1993, he started the NASA Beowulf Project to harness multiple personal computers, with the goal of accelerating large technical application programs with order-of-magnitude improvement in performance to cost. This led to the creation of the Beowulf-class of PC clusters and initiated the emergence of Linux-based commodity clusters, for which he and his colleagues were awarded the Gordon Bell Prize in 1997.
His 1998 MIT Press book How to Build a Beowulf was a landmark work in cluster computing and sold out its first printing in six weeks. Throughout the decade of the 1990s, Sterling was a leader in the National Petaflops Initiative, a loose confederation of experts and institutions across the country sponsored by the federal government to investigate concepts and technologies for enabling systems capable of achieving performance in the transpetaflops regime.
As part of this groundbreaking exploration, he chaired multiple interdisciplinary workshops and coauthored the book Enabling Technologies for Petaflops Computing. He also supported the president's Information Technology Advisory Committee in 1999 and was a member of both the DOD Integrated High End Computing Initiative in 2002 and the multi-agency High End Computing Revitalization Task Force workshop in 2003.
From 1996 to 2000 Sterling was the principal investigator of the HTMT project involving more than a dozen institutions and 70 contributors to conduct a design study of a potential future petaflops-scale computer incorporating advanced technologies including superconducting logic, optical communications, holographic storage, and processor in memory (PIM) components. Sterling and his team at Caltech and JPL are currently developing a new class of advanced PIM architecture for efficient scalable high-end computing, and he is collaborating on related research with a number of institutions, including the University of Notre Dame, Argonne National Laboratory, Sandia National Laboratory, the University of Delaware, and Cray.
William Gropp received his B.S. in mathematics from Case Western Reserve University in 1977, an M.S. in physics from the University of Washington in 1978, and a Ph.D. in computer science from Stanford in 1982. He held the positions of assistant (1982–1988) and associate (1988–1990) professor in the Computer Science Department of Yale University. In 1990, he joined the numerical analysis group at Argonne, where he is a senior computer scientist and associate director of the Mathematics and Computer Science Division, a senior scientist in the Department of Computer Science at the University of Chicago, and a senior fellow in the Argonne-University of Chicago Computation Institute.
His research interests are in parallel computing, software for scientific computing, and numerical methods for partial differential equations. Gropp has played a major role in the development of the MPI message-passing standard. He is coauthor of MPICH, the most widely used implementation of MPI, and was involved in the MPI Forum as a chapter author for both MPI-1 and MPI-2.
He has coauthored several books on MPI, including Using MPI and Using MPI-2. He has developed adaptive mesh refinement and domain decomposition methods with a focus on scalable parallel algorithms; these algorithms and their application to significant scientific problems are discussed in a book he coauthored, entitled Parallel Multilevel Methods for Elliptic Partial Differential Equations.
Gropp is also one of the designers of the PETSc parallel numerical library and has developed efficient and scalable parallel algorithms for the solution of linear and nonlinear equations. In addition, he is involved in several other advanced computing projects, including performance modeling, data structure modification for ultra-high-performance computers, and development of component-based software to promote interoperability among numerical toolkits.