Title
Data dissemination for distributed computing.
Abstract
Large-scale distributed systems provide an attractive scalable infrastructure for network
applications. However, the loosely-coupled nature of this environment can make
data access unpredictable, and in the limit, unavailable. This thesis strives to provide
predictability in data access for data-intensive computing in large-scale computational
infrastructures.
A key requirement for achieving predictability in data access is the ability to estimate
network performance for data transfer so that computation tasks can take advantage
of the estimation in their deployment or data source selection. This thesis develops
a framework called OPEN (Overlay Passive Estimation of Network Performance) for
scalable network performance estimation. OPEN provides an estimation of end-to-end
accessibility for applications by utilizing past measurements without the use of explicit
probing. Unlike existing passive approaches, OPEN is not restricted to pairwise or
a single network in utilizing historical information; instead, it shares measurements
between nodes without any restrictions. As a result, it achieves n2 estimations by O(n)
measurements.
In addition, this thesis considers data dissemination in two specific environments.
First, we consider a parallel data access environment in which multiple replicated servers
can be utilized to download a single data file in parallel. To improve both performance
and fault tolerance, we present a new parallel data retrieval algorithm and explore a
broad set of resource selection heuristics. Second, we consider collective data access
in applications for which group performance is more important than individual performance.
In this work, we employ communication makespan as a group performance
metric and propose server selection heuristics to maximize collective performance.
Description
University of Minnesota Ph.D. dissertation. February 2010. Major: Computer Science. Advisors: Prof. Jon B. Weissman, Prof. Abhishek Chandra. 1 computer file (PDF); x, 129 pages. Ill. (some col.)
Suggested Citation
Kim, Jinoh.
(2010).
Data dissemination for distributed computing..
Retrieved from the University of Minnesota Digital Conservancy,
https://hdl.handle.net/11299/59575.