High Performance Programming in the Partitioned Global Address Space Model
February 27, 2004
SIAM Associated Conference: Parallel Processing 2004
Kathy Yelick, University of California, Berkeley
This course offers training in a programming model for large-scale parallel machines that is an alternative to the popular message passing model. The model offers some of the convenience of thread-based shared memory programming, while yielding performance that is comparable to (and in some cases better than) MPI. The model has been around for several years, but has recently gained interest because it is being supported by multiple vendors and there are new open source implementations available across platforms. There are three "instances" of the general model that will be described, which should appeal to different members of the audience -- one based on C, one based on Fortran, and one based on Java. The course should be of interest to students, who may still be learning parallel programming techniques, and more senior application scientists with problems that are difficult to implement in a message passing model or are likely to benefit from a one-sided communication model.
Katherine Yelick, U.C. Berkeley and LBNL
Dr. Katherine Yelick received her BS, MS, and PhD degrees in EECS from MIT and is now a Professor at the University of California at Berkeley. She was one of the designers of the Split-C, UPC, and Titanium parallel languages, in addition to leading the Sparsity project and the IRAM Compiler effort.
William Carlson, IDA Center for Computing Sciences
Dr. William Carlson has an BS, MSEE, and PhD in Electrical Engineering. From 1988 to 1990, Dr. Carlson was an Assistant Professor at the University of Wisconsin-Madison, and since then he has been with the IDA Center for Computing Sciences. He currently leads the UPC language design effort.
Tarek El-Ghazawi, The George Washington University
Dr. El-Ghazawi has a PhD from New Mexico State University in Electrical and Computer Engineering. He is a Professor in the Department of Electrical and Computer Engineering at the George Washington University (GWU). He is one of the co-authors of the v1.0 UPC specifications and is currently leading the UPC benchmarking and I/O efforts.
Bob Numrich, University of Minnesota
Bob Numrich, now with University of Minnesota, began his career as a theoretical physical chemist, first at Control Data and then at Cray Research. He invented the get/put model for the CRAY-T3D, which has evolved into the Shmem Library, and later developed F-- and Co-Array Fortran.
The partitioned global address space programming model has the potential to achieve a balance between ease-of-programming and performance. As in the shared-memory model, one thread may directly read and write memory allocated by another, and the programmer need not specify whether accesses are local or remote. At the same time, the model gives programmers control over parallel program features that are essential for performance, namely locality, load balancing, and synchronization, in a machine-independent manner.
The model is receiving rising attention from both industry and research environments, due to the recent release of commercial as well as open source research compilers. In this tutorial, we will present the concepts associated with partitioned shared address space programming. These include execution models, synchronization, workload distribution, and memory consistency. We then introduce the syntax and semantics of three parallel programming language instances that are under active use and development. These are Unified Parallel C or UPC, which is developed by a consortium of academia, industry, and government; Co-Array FORTRAN, which was developed at Cray Inc.; and Titanium, a Java-based language from UC Berkeley.
It will be shown through experimental studies that these paradigms can deliver performance comparable with message passing, while maintaining the ease of programming of the shared memory model. The tutorial will present several recent results related to the specification and implementation of the three languages. The UPC consortium has recently released formal specifications for the v1.1 and UPC implementations are currently available for Cray X-1, Cray T3D/E, HP, Sun Servers, SGI O2000/3000, the IBM SP, Myrinet/Linux clusters, and Quadrics/Linux clusters, as well as portable compilers from both U. Michigan and Berkeley/LBNL. Co-Array Fortran runs on both Cray machines and there is a portable open source compiler under development at Rice, which is likely to be available prior to this tutorial. The Titanium language is based on Java, and has a portable open source compiler with several new optimizations for irregular communication and memory access patterns. After describing each of the languages, the lecturers will focus on the common concepts among the discussed languages and will demonstrate using experimental case studies that, when optimized, distributed shared address space codes can outperform or at least have comparable performance to message passing MPI codes. Various hand optimizations, as well as the opportunities of improving performance via complier optimizations, will be discussed.
We believe that offering this tutorial at SIAMPP04 will increase the awareness of a promising parallel programming paradigm to a large class of new users and will improve the parallel programming productivity of some attendees. The benefits are to both the audience and to the community of language developers who need feedback from users.
The audience is expected to learn a great deal about parallel programming issues as well as about the three rising programming languages. Some of today's parallel computers from leading supercomputing company's like Cray and HP are delivered with these languages. There is real evidence that this will only continue to grow and the need for more professionals who are aware of these languages and the concepts behind them is here.
Content Level: 30% Intro, 50% Intermediate, and 20% advanced
Engineers and Scientists interested in high-performance computing, students, and other high-performance computing professionals from academia, government labs, or industry. The tutorial is primarily designed for application programmers, but should also be of interest to compiler writers and other developers of parallel software infrastructure.
The audience is expected to have a reasonable level of understanding of the basic programming languages such as C, Fortran, or Java. Prior knowledge of parallel programming is not required, although attendees with experience in other parallel models such as shared memory threads or message passing will also find the tutorial useful.
The Partitioned Shared Address Space Programming Model
Parallel Execution model
Consistency and Synchronization
The UPC Programming Language
Memory model: Data and pointers
Dynamic Memory allocation
Language Specifications and Current Implementations
Libraries and I/O
Case Studies and Optimizations
Memory Model and Runtime Support
Syntax and Semantics
Synchronization and Control
Libraries and I/O
Case Studies and Optimizations
Additions to sequential Java
Design of distributed data structures using this model.
Libraries and I/O
Application Case Studies and Optimizations
Relations among UPC, Co-Array, Fortran and the Programming Model
Performance Results, Issues, and Expectations