Deakin University
Browse

File(s) under permanent embargo

A novel workflow-level data placement strategy for data-sharing scientific cloud workflows

journal contribution
posted on 2019-05-01, 00:00 authored by Xuejun Li, Lei Zhang, Yang Wu, Xiao LiuXiao Liu, Erzhou Zhu, Huikang Yi, Futian Wang, Cheng Zhang, Yun Yang
Cloud computing can provide a more cost-effective way to deploy scientific workflows than traditional distributed computing environments such as cluster and grid. Due to the large size of scientific datasets, data placement plays an important role in scientific cloud workflow systems for improving system performance and reducing data transfer cost. Traditional task-level data placement strategy only considers shared datasets within individual workflows to reduce data transfer cost. However, it is obvious that task-level strategy is not necessarily good enough for the situation of multiple workflows at the workflow level. In this paper, a novel workflow-level data placement model is constructed, which regards multiple workflows as a whole. Then, a two-stage data placement strategy is proposed which first pre-allocates initial datasets to proper datacenters during workflow build-time stage, and then dynamically distributes newly generated datasets to appropriate datacenters during runtime stage. Both stages use an efficient discrete particle swarm optimization algorithm to place flexible-location datasets. Comprehensive experiments demonstrate that our workflow-level data placement strategy can be more cost-effective than its task-level counterpart for data-sharing scientific cloud workflows.

History

Journal

IEEE transactions on services computing

Volume

12

Issue

3

Season

May-June

Pagination

370 - 383

Publisher

IEEE

Location

Piscataway, N.J.

ISSN

1939-1374

Language

eng

Publication classification

C1 Refereed article in a scholarly journal

Copyright notice

2016, IEEE