TY - GEN
T1 - Accelerating overlapping community detection
T2 - 26th International European Conference on Parallel and Distributed Computing, Euro-Par 2020
AU - El-Helw, Ismail
AU - Hofman, Rutger
AU - Bal, Henri E.
PY - 2020
Y1 - 2020
N2 - Building efficient algorithms for data-intensive problems requires deep analysis of data access patterns. Random data access patterns exacerbate this process. In this paper, we discuss accelerating a randomized data-intensive machine learning algorithm using multi-core CPUs and several types of GPUs. A thorough analysis of the algorithm’s data dependencies enabled a 75% reduction in its memory footprint. We created custom compute kernels via code generation to identify the optimal set of data placement and computational optimizations per compute device. An empirical evaluation shows up to 245x speedup compared to an optimized sequential version. Another result from this evaluation is that achieving peak performance does not always match intuition: e.g., depending on the GPU architecture, vectorization may increase or hamper performance.
AB - Building efficient algorithms for data-intensive problems requires deep analysis of data access patterns. Random data access patterns exacerbate this process. In this paper, we discuss accelerating a randomized data-intensive machine learning algorithm using multi-core CPUs and several types of GPUs. A thorough analysis of the algorithm’s data dependencies enabled a 75% reduction in its memory footprint. We created custom compute kernels via code generation to identify the optimal set of data placement and computational optimizations per compute device. An empirical evaluation shows up to 245x speedup compared to an optimized sequential version. Another result from this evaluation is that achieving peak performance does not always match intuition: e.g., depending on the GPU architecture, vectorization may increase or hamper performance.
KW - Algorithms for accelerators and heterogeneous systems
KW - Combinatorial and data intensive application
KW - Performance analysis
UR - https://www.scopus.com/pages/publications/85090097403
UR - https://www.scopus.com/inward/citedby.url?scp=85090097403&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-57675-2_32
DO - 10.1007/978-3-030-57675-2_32
M3 - Conference contribution
AN - SCOPUS:85090097403
SN - 9783030576745
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 510
EP - 526
BT - Euro-Par 2020: Parallel Processing
A2 - Malawski, Maciej
A2 - Rzadca, Krzysztof
PB - Springer
Y2 - 24 August 2020 through 28 August 2020
ER -