TY - GEN
T1 - A generic vectorization scheme and a GPU kernel for the phylogenetic likelihood library
AU - Izquierdo-Carrasco, Fernando
AU - Alachiotis, Nikolaos
AU - Berger, Simon
AU - Flouri, Tomas
AU - Pissis, Solon P.
AU - Stamatakis, Alexandros
PY - 2013/1/1
Y1 - 2013/1/1
N2 - Highly optimized library implementations for important scientific kernels can improve scientific productivity. To this end, we are currently developing the Phylogenetic Likelihood Library (PLL) that implements functions to compute and optimize the phylogenetic likelihood score on evolutionary trees. Here, we focus on novel techniques to orchestrate likelihood computations on large vector-like processors such as GPUs. We present a novel scheme for vectorizing computations and organizing conditional likelihood arrays (CLAs) in such a way that they do not need to be transferred at all between the GPU and the CPU. We compare the performance of our GPU implementation for DNA data with a highly optimized x86 version of the PLL that relies on manually tuned AVX intrinsics. Our GPU implementation accelerates the likelihood computations by a factor of two compared to the, most probably, currently fastest available x86 implementation. We conclude that, a hybrid GPU-CPU version needs to be developed and integrated into the PLL to leverage the computational power of modern desktop systems and clusters.
AB - Highly optimized library implementations for important scientific kernels can improve scientific productivity. To this end, we are currently developing the Phylogenetic Likelihood Library (PLL) that implements functions to compute and optimize the phylogenetic likelihood score on evolutionary trees. Here, we focus on novel techniques to orchestrate likelihood computations on large vector-like processors such as GPUs. We present a novel scheme for vectorizing computations and organizing conditional likelihood arrays (CLAs) in such a way that they do not need to be transferred at all between the GPU and the CPU. We compare the performance of our GPU implementation for DNA data with a highly optimized x86 version of the PLL that relies on manually tuned AVX intrinsics. Our GPU implementation accelerates the likelihood computations by a factor of two compared to the, most probably, currently fastest available x86 implementation. We conclude that, a hybrid GPU-CPU version needs to be developed and integrated into the PLL to leverage the computational power of modern desktop systems and clusters.
KW - GPU
KW - maximum likelihood
KW - OpenCL
KW - phylogenetics
KW - vector intrinsics
UR - http://www.scopus.com/inward/record.url?scp=84899736713&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84899736713&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW.2013.103
DO - 10.1109/IPDPSW.2013.103
M3 - Conference contribution
AN - SCOPUS:84899736713
SN - 9780769549798
T3 - Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013
SP - 530
EP - 538
BT - Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013
PB - IEEE Computer Society
T2 - 2013 IEEE 37th Annual Computer Software and Applications Conference, COMPSAC 2013
Y2 - 22 July 2013 through 26 July 2013
ER -