Fault-tolerant Scheduling of Fine-grained Tasks in Grid Environments

G. Wrzesinska, R.V. van Nieuwpoort, J. Maassen, T. Kielmann, H.E. Bal

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Divide-and-conquer is a well-suited programming paradigm for parallel Grid applications. Our Satin system efficiently schedules the fine-grained tasks of a divide-andconquer application across multiple clusters in a grid. To accommodate long-running applications, we present a fault-tolerance mechanism for Satin that has negligible overhead during normal execution, while minimizing the amount of redundant work done after a crash of one or more nodes. We study the impact of our fault-tolerance mechanism on application efficiency, both on the Dutch DAS-2 system and using the European testbed of the EC-funded project GridLab. © 2006 SAGE Publications.
Original languageEnglish
Pages (from-to)103-114
JournalHigh performance computing applications
Volume20
Issue number1
DOIs
Publication statusPublished - 2006

Bibliographical note

WrzesinskaHPA05

Fingerprint

Dive into the research topics of 'Fault-tolerant Scheduling of Fine-grained Tasks in Grid Environments'. Together they form a unique fingerprint.

Cite this