UR Research > Computer Science Department > CS Systems Technical Reports >

Scheduling for Locality in Shared-Memory Multiprocessors

URL to cite or link to: http://hdl.handle.net/1802/824

93.tr457.scheduling_for_locality_in_shared_memory_multiproces.ps   1.09 MB (No. of downloads : 125)
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 1993. Simultaneously published in the Technical Report series.
The last decade has produced enormous improvements in processor speed without a corresponding improvement in bus or interconnection network speeds. As a result, the relative costs of communication and computation in shared-memory multiprocessors have changed dramatically. An important consequence of this trend is that many parallel applications, whose performance depends on a delicate balance between the cost of communication and computation, do not execute efficiently on today's shared-memory multiprocessors. In this dissertation we quantify the effect of this trend in multiprocessor architecture on parallel program performance, explain the implications of this trend on popular parallel programming models, and propose system software to efficiently map parallel programs and programming models to modern shared-memory multiprocessors. Our experiments with application programs on bus-based, cache-coherent machines like the Sequent Symmetry, and large-scale distributed-memory machines like the BBN Butterfly, confirm that applications scale much better on previous-generation machines than on current machines due to the rising cost of communication. Our experiments also suggest that shared-memory programming models, which can be implemented efficiently on the machines of yesterday, do not readily port to state-of-the-art machines. As a solution, we propose new decomposition and scheduling algorithms that significantly reduce communication overhead. Our scheduling algorithms, which apply equally well to run-time libraries and parallelizing compilers, attempt to co-locate processes and data, assigning processes to processors based on the location of the date they will access. Our experiments over a wide variety of shared-memory multiprocessors demonstrate that the performance benefits of these scheduling-for-locality algorithms are significant, improving performance by up to 60\% for some applications on modern machines. We conclude that communication overhead need not dominate performance on present or future multiprocessors, given an appropriate programming model, multiprogramming scheduling policy, and user-level decomposition and scheduling algorithms.
Contributor(s):
Evangelos P. Markatos - Author

Thomas J. LeBlanc - Thesis Advisor

Primary Item Type:
Technical Report
Secondary Item Type(s):
Thesis
Series/Report Number:
UR CSD / TR457
Language:
English
Subject Keywords:
shared-memory multiprocessors;multiprogramming;architecture trends;loop scheduling;lightweight thread scheduling
First presented to the public:
5/1993
Original Publication Date:
5/1993
Previously Published By:
University of Rochester. Computer Science Department.
Citation:
License Grantor / Date Granted:
Suzanne S. Bell / 2009-08-01 13:03:13.872 ( View License )
Date Deposited
2009-08-01 13:03:13.868
Date Last Updated
2012-09-26 16:35:14.586719
Submitter:
Suzanne S. Bell

Copyright © This item is protected by copyright, with all rights reserved.

All Versions

Thumbnail Name Version Created Date
Scheduling for Locality in Shared-Memory Multiprocessors1 2009-08-01 13:03:13.868