Optimizing convolution operations on GPUs using adaptive tiling

B. van Werkhoven, J. Maassen, H.E. Bal, F.J. Seinstra

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

The research domain of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia data. High-performance computing techniques are necessary to satisfy the ever increasing computational demands of MMCA applications. The introduction of Graphics Processing Units (GPUs) in modern cluster systems presents application developers with a challenge. While GPUs are well known to be capable of providing significant performance improvements, the programming complexity vastly increases. To this end, we have extended a user transparent parallel programming model for MMCA, named Parallel-Horus, to allow the execution of compute intensive operations on the GPUs present in the cluster. The most important class of operations in the MMCA domain are convolutions, which are typically responsible for a large fraction of the execution time. Existing optimization approaches for CUDA kernels in general as well as those specific to convolution operations are too limited in both performance and flexibility. In this paper, we present a new optimization approach, called adaptive tiling, to implement a highly efficient, yet flexible, library-based convolution operation for modern GPUs. To the best of our knowledge, our implementation is the most optimized and best performing implementation of 2D convolution in the spatial domain available to date. © 2013 Elsevier B.V. All rights reserved.
LanguageEnglish
Pages14-26
JournalFuture Generation Computer Systems
Volume30
Issue number1
DOIs
Publication statusPublished - 2014

Fingerprint

Convolution
Parallel programming
Graphics processing unit

Cite this

@article{e341ee43cfe14f99a4af6ca1418f01dd,
title = "Optimizing convolution operations on GPUs using adaptive tiling",
abstract = "The research domain of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia data. High-performance computing techniques are necessary to satisfy the ever increasing computational demands of MMCA applications. The introduction of Graphics Processing Units (GPUs) in modern cluster systems presents application developers with a challenge. While GPUs are well known to be capable of providing significant performance improvements, the programming complexity vastly increases. To this end, we have extended a user transparent parallel programming model for MMCA, named Parallel-Horus, to allow the execution of compute intensive operations on the GPUs present in the cluster. The most important class of operations in the MMCA domain are convolutions, which are typically responsible for a large fraction of the execution time. Existing optimization approaches for CUDA kernels in general as well as those specific to convolution operations are too limited in both performance and flexibility. In this paper, we present a new optimization approach, called adaptive tiling, to implement a highly efficient, yet flexible, library-based convolution operation for modern GPUs. To the best of our knowledge, our implementation is the most optimized and best performing implementation of 2D convolution in the spatial domain available to date. {\circledC} 2013 Elsevier B.V. All rights reserved.",
author = "{van Werkhoven}, B. and J. Maassen and H.E. Bal and F.J. Seinstra",
year = "2014",
doi = "10.1016/j.future.2013.09.003",
language = "English",
volume = "30",
pages = "14--26",
journal = "Future Generation Computer Systems",
issn = "0167-739X",
publisher = "Elsevier",
number = "1",

}

Optimizing convolution operations on GPUs using adaptive tiling. / van Werkhoven, B.; Maassen, J.; Bal, H.E.; Seinstra, F.J.

In: Future Generation Computer Systems, Vol. 30, No. 1, 2014, p. 14-26.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Optimizing convolution operations on GPUs using adaptive tiling

AU - van Werkhoven, B.

AU - Maassen, J.

AU - Bal, H.E.

AU - Seinstra, F.J.

PY - 2014

Y1 - 2014

N2 - The research domain of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia data. High-performance computing techniques are necessary to satisfy the ever increasing computational demands of MMCA applications. The introduction of Graphics Processing Units (GPUs) in modern cluster systems presents application developers with a challenge. While GPUs are well known to be capable of providing significant performance improvements, the programming complexity vastly increases. To this end, we have extended a user transparent parallel programming model for MMCA, named Parallel-Horus, to allow the execution of compute intensive operations on the GPUs present in the cluster. The most important class of operations in the MMCA domain are convolutions, which are typically responsible for a large fraction of the execution time. Existing optimization approaches for CUDA kernels in general as well as those specific to convolution operations are too limited in both performance and flexibility. In this paper, we present a new optimization approach, called adaptive tiling, to implement a highly efficient, yet flexible, library-based convolution operation for modern GPUs. To the best of our knowledge, our implementation is the most optimized and best performing implementation of 2D convolution in the spatial domain available to date. © 2013 Elsevier B.V. All rights reserved.

AB - The research domain of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia data. High-performance computing techniques are necessary to satisfy the ever increasing computational demands of MMCA applications. The introduction of Graphics Processing Units (GPUs) in modern cluster systems presents application developers with a challenge. While GPUs are well known to be capable of providing significant performance improvements, the programming complexity vastly increases. To this end, we have extended a user transparent parallel programming model for MMCA, named Parallel-Horus, to allow the execution of compute intensive operations on the GPUs present in the cluster. The most important class of operations in the MMCA domain are convolutions, which are typically responsible for a large fraction of the execution time. Existing optimization approaches for CUDA kernels in general as well as those specific to convolution operations are too limited in both performance and flexibility. In this paper, we present a new optimization approach, called adaptive tiling, to implement a highly efficient, yet flexible, library-based convolution operation for modern GPUs. To the best of our knowledge, our implementation is the most optimized and best performing implementation of 2D convolution in the spatial domain available to date. © 2013 Elsevier B.V. All rights reserved.

U2 - 10.1016/j.future.2013.09.003

DO - 10.1016/j.future.2013.09.003

M3 - Article

VL - 30

SP - 14

EP - 26

JO - Future Generation Computer Systems

T2 - Future Generation Computer Systems

JF - Future Generation Computer Systems

SN - 0167-739X

IS - 1

ER -