Performance of Parallel, Scientific Applications: Accomplishments and Challenges
Parallel systems have evolved from homogeneous systems with single processors per node to current systems that are hierarchical and heterogeneous using chip multiprocessors such that each node consists of multiple processors. For example, DataStar at San Diego Supercomputing Center utilizes the Power4 architecture, which has two cores per chip, and each system node consists of four Power4 chips for the P655 or 16 Power4 chips for the P690. To achieve good performance of such systems, it is important to take advantage of the hierarchical organization of the processors (i.e., processors with a chip, between chips, between nodes) and the heterogeneity of the networks interconnecting the processors. In this talk, I will present techniques that have been used to achieve good performance on parallel scientific applications on various parallel systems, current and former systems. Further, I will discuss challenges that we face with current and future systems.
Valerie Taylor, Texas A&M University.