SIAM Short Course on Performance Programming for Scientific Computation

Thursday, March 13, 1997

Hyatt Regency Minneapolis on Nicollet Mall

SIAM Short Course on
Performance Programming for Scientific Computation

Organizers and Instructors

Bowen Alpern, IBM T. J. Watson Research Center, and Larry Carter, University of California, San Diego and San Diego Supercomputer Center.

Description

Performance programming seeks to improve performance beyond what is achieved by programming an algorithm in the most expedient manner. The goal is to keep each processing element as busy as possible doing useful work. This entails satisfying four requirements: breaking problems into independent subproblems that can be executed concurrently, distributing these subproblems appropriately among the processing elements, making sure that the necessary data is close to its processing element, and overlapping communication with computation where possible. To attain high performance, these requirements must be satisfied whether one views "processing elements" as stages of an arithmetic or vector pipeline, functional units of a CPU, processors of a tightly coupled shared-memory multiprocessor, nodes of a distributed-memory supercomputer, or heterogeneous computers on a network. This tutorial presents general techniques for satisfying each of these requirements and illustrates their use at many different levels of application.

Level of Presentation

30% beginner; 50% intermediate; 20% advanced

The tutorial will use extended examples, including two-dimensional seismic migration, protein matching, and computational linear algebra (matrix factoring, matrix multiplication, and its degenerate cases). Seismic migration is a representative of certain partial differential equation problems, protein matching is a typical dynamic programming application, and linear algebra is ubiquitous. Other examples will be introduced to illustrate particular points. While we will survey a large number of topics and techniques, the emphasis will be on mastering conceptual structures and understanding general principles rather than on learning details.

Intended Audience

The tutorial is intended for computational scientists, application developers, and other professionals who have a need to design, implement, or tune high-performance scientific programs. It should also be of interest to computer scientists who want to develop languages, compilers, operating systems, architectures, and performance monitoring and debugging tools that can better support the needs of the performance programming community.

Instructors

Bowen Alpern received a Ph.D. in Computer Science from Cornell University in 1986. He has been a Research Staff Member in the Mathematical Sciences Department of the IBM T. J. Watson Research Center since 1986. His research interests include performance programming, visualization of computation and architecture, theoretical models of hierarchical memory and parallelism, distributed and parallel computing, message compression, computational linear algebra, and portable high-performance computing. He has published more that twenty-five technical papers in computer science. He taught a graduate-level course in performance programming for the Computer Science Department of Columbia University in 1994.

Larry Carter is a Professor in the Computer Science and Engineering Department of the University of California at San Diego, and a Senior Fellow at the San Diego Supercomputer Center. Dr. Carter received his Ph.D. from the University of California at Berkeley in 1974, and worked until 1994 at IBM's T. J. Watson Research Center in the areas of probabilistic algorithms, compilers, VLSI testing, and high-performance computation. His current research interests include scientific computation, performance programming, parallel computation, and machine and system architecture for high-performance computing.

Bowen and Larry developed the matrix multiplication package initially released with the RS/6000 and helped implement the NAS benchmarks on the IBM SP.

Program

Morning

8:00 Registration

8:30-9:15 Introduction: What is performance programming?
Challenges to attaining high performance
The scientific method
Visualizing computers and computation
Extended example: seismic migration
9:15-10:00 Architecture for Performance Programmers: The RAM and PRAM models
Unblocked matrix multiplication
A two-level memory model
The memory hierarchy
Multiple processing elements and parallelism
Pipelines and their hazards
The Parallel Memory Hierarchy model of computation

10:00-10:30 Coffee

10:30-12:30 General Techniques: Localization
Parallelization
Pipelining
Example: dense linear algebra
Example: integer tallying
Example: protein matching
Extended example: the NAS/CG benchmark

Afternoon

12:30-2:00 Lunch

2:00-3:30 Miscellaneous Tips and Techniques: Reading assembly code
Timing and profiling
A dusty-deck Cray-code example
Inner loop considerations
Example: the NAS/EP benchmark
Message compression

3:30-4:00 Coffee

4:00-4:45 Portable High Performance: The LAPACK paradigm
Polyalgorithms and tuning parameters
Toward a methodology for portable performance
4:45-5:30 Review: Extended example: fast Fourier transforms

5:30 Short Course adjourns

Important Notice

For a complete, updated description of the short course, visit: http://www.research.ibm.com/perfprog/course/PP97.html

The short course will take place on 2nd Floor in Greenway F-H; coffee breaks will be in Promenade area 2nd Floor; and lunch will be in Greenway A-B/I-J.

MMD, 2/13/97