Stepwise Refinement for Performance: a Methodology for Many-core Programming

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Many-core hardware is targeted specifically at obtaining high performance, but reaching high performance is often challenging because hardware-specific details have to be taken into account. Although there are many programming systems that try to alleviate many-core programming, some providing a high-level language, others providing a low-level language for control, none of these systems have a clear and systematic methodology as a foundation. In this article, we propose stepwise-refinement for performance: a novel, clear, and structured methodology for obtaining high performance on many-cores. We present a system that supports this methodology, offers multiple levels of abstraction to provide programmers a trade-off between high-level and low-level programming, and provides programmers detailed performance feedback. We evaluate our methodology with several widely varying compute kernels on two different many-core architectures: a Graphical Processing Unit (GPU) and the Xeon Phi. We show that our methodology gives insight in the performance, and that in almost all cases, we gain a substantial performance improvement using our methodology.
Original languageEnglish
JournalConcurrency and Computation: Practice and Experience
Volume27
Early online date9 Feb 2015
DOIs
Publication statusPublished - 2015

Fingerprint

Many-core
Refinement
Machine oriented languages
Programming
Hardware
Computer systems programming
High level languages
Methodology
Computer programming
High Performance
Feedback
Processing
Trade-offs
kernel
Unit
Evaluate

Cite this

@article{1666665f6f294844b3beb217d3f79682,
title = "Stepwise Refinement for Performance: a Methodology for Many-core Programming",
abstract = "Many-core hardware is targeted specifically at obtaining high performance, but reaching high performance is often challenging because hardware-specific details have to be taken into account. Although there are many programming systems that try to alleviate many-core programming, some providing a high-level language, others providing a low-level language for control, none of these systems have a clear and systematic methodology as a foundation. In this article, we propose stepwise-refinement for performance: a novel, clear, and structured methodology for obtaining high performance on many-cores. We present a system that supports this methodology, offers multiple levels of abstraction to provide programmers a trade-off between high-level and low-level programming, and provides programmers detailed performance feedback. We evaluate our methodology with several widely varying compute kernels on two different many-core architectures: a Graphical Processing Unit (GPU) and the Xeon Phi. We show that our methodology gives insight in the performance, and that in almost all cases, we gain a substantial performance improvement using our methodology.",
author = "P. Hijma and {van Nieuwpoort}, R.V. and C.J.H. Jacobs and H.E. Bal",
year = "2015",
doi = "10.1002/cpe.3416",
language = "English",
volume = "27",
journal = "Concurrency and Computation: Practice and Experience",
issn = "1532-0626",
publisher = "John Wiley and Sons Ltd.",

}

TY - JOUR

T1 - Stepwise Refinement for Performance: a Methodology for Many-core Programming

AU - Hijma, P.

AU - van Nieuwpoort, R.V.

AU - Jacobs, C.J.H.

AU - Bal, H.E.

PY - 2015

Y1 - 2015

N2 - Many-core hardware is targeted specifically at obtaining high performance, but reaching high performance is often challenging because hardware-specific details have to be taken into account. Although there are many programming systems that try to alleviate many-core programming, some providing a high-level language, others providing a low-level language for control, none of these systems have a clear and systematic methodology as a foundation. In this article, we propose stepwise-refinement for performance: a novel, clear, and structured methodology for obtaining high performance on many-cores. We present a system that supports this methodology, offers multiple levels of abstraction to provide programmers a trade-off between high-level and low-level programming, and provides programmers detailed performance feedback. We evaluate our methodology with several widely varying compute kernels on two different many-core architectures: a Graphical Processing Unit (GPU) and the Xeon Phi. We show that our methodology gives insight in the performance, and that in almost all cases, we gain a substantial performance improvement using our methodology.

AB - Many-core hardware is targeted specifically at obtaining high performance, but reaching high performance is often challenging because hardware-specific details have to be taken into account. Although there are many programming systems that try to alleviate many-core programming, some providing a high-level language, others providing a low-level language for control, none of these systems have a clear and systematic methodology as a foundation. In this article, we propose stepwise-refinement for performance: a novel, clear, and structured methodology for obtaining high performance on many-cores. We present a system that supports this methodology, offers multiple levels of abstraction to provide programmers a trade-off between high-level and low-level programming, and provides programmers detailed performance feedback. We evaluate our methodology with several widely varying compute kernels on two different many-core architectures: a Graphical Processing Unit (GPU) and the Xeon Phi. We show that our methodology gives insight in the performance, and that in almost all cases, we gain a substantial performance improvement using our methodology.

U2 - 10.1002/cpe.3416

DO - 10.1002/cpe.3416

M3 - Article

VL - 27

JO - Concurrency and Computation: Practice and Experience

JF - Concurrency and Computation: Practice and Experience

SN - 1532-0626

ER -