Cluster communication protocols for parallel-programming systems

K. Verstoep, R.A.F. Bhoedjang, T. Rühl, H.E. Bal, R. Hofman

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Clusters of workstations are a popular platform for high-performance computing. For many parallel applications, efficient use of a fast interconnection network is essential for good performance. Several modern System Area Networks include programmable network interfaces that can be tailored to perform protocol tasks that otherwise would need to be done by the host processors. Finding the right trade-off between protocol processing at the host and the network interface is difficult in general. In this work, we systematically evaluate the performance of different implementations of a single, user-level communication interface. The implementations make different architectural assumptions about the reliability of the network and the capabilities of the network interface. The implementations differ accordingly in their division of protocol tasks between host software, network-interface firmware, and network hardware. Also, we investigate the effects of alternative data-transfer methods and multicast implementations, and we evaluate the influence of packet size. Using microbenchmarks, parallel-programming systems, and parallel applications, we assess the performance of the different implementations at multiple levels. We use two hardware platforms with different performance characteristics to validate our conclusions. We show how moving protocol tasks to a relatively slow network interface can yield both performance advantages and disadvantages, depending on specific characteristics of the application and the underlying parallel-programming system.
Original languageEnglish
Pages (from-to)281-325
JournalACM Transactions on Computer Systems
Volume22
Issue number3
DOIs
Publication statusPublished - 2004

Bibliographical note

1012269

Fingerprint Dive into the research topics of 'Cluster communication protocols for parallel-programming systems'. Together they form a unique fingerprint.

Cite this