Overcoming data locality: An in-memory runtime file system with symmetrical data distribution

A. Uta, A. Sandu, T. Kielmann

Research output: Contribution to JournalArticleAcademicpeer-review


In many-task computing (MTC), applications such as scientific workflows or parameter sweeps communicate via intermediate files; application performance strongly depends on the file system in use. The state of the art uses runtime systems providing in-memory file storage that is designed for data locality: files are placed on those nodes that write or read them. With data locality, however, task distribution conflicts with data distribution, leading to application slowdown, and worse, to prohibitive storage imbalance. To overcome these limitations, we present MemFS, a fully symmetrical, in-memory runtime file system that stripes files across all compute nodes, based on a distributed hash function. Our cluster experiments with Montage and BLAST workflows, using up to 512 cores, show that MemFS has both better performance and better scalability than the state-of-the-art, locality-based file system, AMFS. Furthermore, our evaluation on a public commercial cloud validates our cluster results. On this platform MemFS shows excellent scalability up to 1024 cores and is able to saturate the 10G Ethernet bandwidth when running BLAST and Montage.
Original languageEnglish
Pages (from-to)144-158
JournalFuture Generation Computer Systems
Publication statusPublished - 2016


Dive into the research topics of 'Overcoming data locality: An in-memory runtime file system with symmetrical data distribution'. Together they form a unique fingerprint.

Cite this