TY - JOUR
T1 - Optimization of large scale HEP data analysis in LHCb
AU - Remenska, Daniela
AU - Aaij, Roel
AU - Raven, Gerhard
AU - Merk, Marcel
AU - Templon, Jeff
AU - Bril, Reinder J.
PY - 2011
Y1 - 2011
N2 - Observation has lead to a conclusion that the physics analysis jobs run by LHCb physicists on a local computing farm (i.e. non-grid) require more efficient access to the data which resides on the Grid. Our experiments have shown that the I/O bound nature of the analysis jobs in combination with the latency due to the remote access protocols (e.g. rfio, dcap) cause a low CPU efficiency of these jobs. In addition to causing a low CPU efficiency, the remote access protocols give rise to high overhead (in terms of amount of data transferred). This paper gives an overview of the concept of pre-fetching and caching of input files in the proximity of the processing resources, which is exploited to cope with the I/O bound analysis jobs. The files are copied from Grid storage elements (using GridFTP), while concurrently performing computations, inspired from a similar idea used in the ATLAS experiment. The results illustrate that this file staging approach is relatively insensitive to the original location of the data, and a significant improvement can be achieved in terms of the CPU efficiency of an analysis job. Dealing with scalability of such a solution on the Grid environment is discussed briefly.
AB - Observation has lead to a conclusion that the physics analysis jobs run by LHCb physicists on a local computing farm (i.e. non-grid) require more efficient access to the data which resides on the Grid. Our experiments have shown that the I/O bound nature of the analysis jobs in combination with the latency due to the remote access protocols (e.g. rfio, dcap) cause a low CPU efficiency of these jobs. In addition to causing a low CPU efficiency, the remote access protocols give rise to high overhead (in terms of amount of data transferred). This paper gives an overview of the concept of pre-fetching and caching of input files in the proximity of the processing resources, which is exploited to cope with the I/O bound analysis jobs. The files are copied from Grid storage elements (using GridFTP), while concurrently performing computations, inspired from a similar idea used in the ATLAS experiment. The results illustrate that this file staging approach is relatively insensitive to the original location of the data, and a significant improvement can be achieved in terms of the CPU efficiency of an analysis job. Dealing with scalability of such a solution on the Grid environment is discussed briefly.
UR - http://www.scopus.com/inward/record.url?scp=84858129743&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84858129743&partnerID=8YFLogxK
U2 - 10.1088/1742-6596/331/7/072060
DO - 10.1088/1742-6596/331/7/072060
M3 - Article
AN - SCOPUS:84858129743
SN - 1742-6588
VL - 331
JO - Journal of Physics : Conference Series
JF - Journal of Physics : Conference Series
IS - PART 7
M1 - 72060
ER -