When parallel applications are run in large-scale distributed environments, such as grids, peer-to-peer (P2P) systems, and clouds, the set of resources used can change dynamically as machines crash, reservations end, and new resources become available. It is vital for applications to respond to these changes. Therefore, it is necessary to keep track of the available resources-a problem which is known to be notoriously difficult. In this article we argue that resource tracking must be provided as the standard functionality in the lower parts of the software stack. We propose a general solution to resource tracking: the Join-Elect-Leave (JEL) model. JEL provides unified resource tracking for parallel and distributed applications across environments. JEL is a simple yet powerful model based on notifying when resources have Joined or Left the computation. We demonstrate that JEL is suitable for resource tracking in a wide variety of programming models, ranging from the fixed resource sets traditionally used in MPI-1 to flexible grid-oriented programming models. We compare several JEL implementations, and show these to perform and scale well in several real-world scenarios involving grids, clouds and P2P systems applied concurrently, and wide-area systems with failing resources. Using JEL, we have won the first prize in a number of international distributed computing competitions. Copyright © 2010 John Wiley & Sons, Ltd.
|Journal||Concurrency and Computation: Practice and Experience|
|Publication status||Published - 2011|