Spear, Michael F..zip (Restricted Access)
You can try Logging In
Zip file of original documents (Admin only)
Thesis (Ph. D.)--University of Rochester. Dept. of Computer Science, 2009.
In the past, only a small group of highly-skilled programmers were expected to write
programs that used multiple processors simultaneously. However, microprocessor vendors
have recently turned to multi-core chip designs as the most profitable way to increase
performance. We are now seeing multi-core processors in desktops, laptops,
handheld computers, and even embedded devices. As a result, parallel programming is
becoming a core competency for all programmers.
When concurrent threads of a parallel program share data, they must ensure that
certain sets of accesses to shared memory execute as indivisible operations (that is,
the regions must appear to execute atomically). The dominant mechanism for providing
atomicity is mutual exclusion locks. Unfortunately, locks are a very low-level
mechanism, widely regarded as too difficult for the average programmer to use correctly.
Fortunately, locks are not the only mechanism that provides atomicity. Database
transactions (used by millions of programmers to write highly concurrent e-commerce
code) also ensure that regions of code execute atomically. Many efforts are underway
to employ transactions to implement atomicity within general-purpose programming
languages, in the form of Transactional Memory (TM).
TM can be implemented in hardware, software, or a combination of the two. Pure
software TM (STM) systems are of particular interest, since they can serve as a replacement
for lock-based atomicity in millions of existing multicore processors. In order for an STM implementation to provide a viable implementation of atomicity, it must not
introduce substantial single-thread latency, it must offer good scalability and throughput,
and it must be general enough to serve as a replacement for most, if not all, of the
uses of locks.
This dissertation introduces three techniques to reduce the latency of STM. The
first source of latency we target relates to consistency checks. While some amount
of checking is necessary by an STM runtime to detect when two in-flight transactions
wish to make incompatible accesses to the same region of memory, we observe that
a lightweight commit counter can avoid most checks, without weakening the STM’s
correctness guarantees. Secondly, we present the RingSTM algorithm, which further
lowers overheads by decreasing the precision of conflict detection: by summarizing the
access patterns of transactions with fixed-size Bloom filters, RingSTM is able to avoid
most uses of costly read-modify-write operations. Lastly, we show how STM-specific
compiler optimizations can decrease the cost of memory fences for STM algorithms
implemented on processors with relaxed memory consistency.
Some of our latency-reducing mechanisms also improve throughput. The commit
counter heuristic can approximate a novel conflict-detection scheme called mixed invalidation,
which prevents transactions from aborting due to conflicts between reader
and writer transactions, so long as the reader commits first. The RingSTM algorithm
guarantees progress, ensuring that no transaction aborts unless some other transaction
commits. We also present a novel approach to contention management (the mechanisms
used by an STM to ensure forward progress). Our contention management policy ensures
good throughput even with high rates of conflict. It also provides a degree of
fairness under high contention, without introducing latency when contention is low. Locks are often used to protect I/O operations, and lock-based code regularly coexists
with unsynchronized code. Since most STM implementations use speculation
and rollback, they cannot easily support I/O and irreversible operations unless they
prevent all concurrency whenever a transaction wishes to perform an irreversible operation.
We show how to allow an inevitable transaction (one that will not roll back, and
thus can perform irreversible operations) to execute concurrently with non-inevitable
transactions. We also present a semantics for STM based on strict serializability. This
semantics allows transactions to transition data between shared and thread-local states
in the same manner as locks. Furthermore, our semantics preserves serializability of all
transactions, without introducing latency at the boundaries of transactions that do not
transition data between shared and thread-local states. Taken together, our inevitability
mechanisms and semantics greatly improve the generality of transactional memory.