Optimization of large scale HEP data analysis in LHCb

  • Daniela Remenska*
  • , Roel Aaij
  • , Gerhard Raven
  • , Marcel Merk
  • , Jeff Templon
  • , Reinder J. Bril
  • *Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Observation has lead to a conclusion that the physics analysis jobs run by LHCb physicists on a local computing farm (i.e. non-grid) require more efficient access to the data which resides on the Grid. Our experiments have shown that the I/O bound nature of the analysis jobs in combination with the latency due to the remote access protocols (e.g. rfio, dcap) cause a low CPU efficiency of these jobs. In addition to causing a low CPU efficiency, the remote access protocols give rise to high overhead (in terms of amount of data transferred). This paper gives an overview of the concept of pre-fetching and caching of input files in the proximity of the processing resources, which is exploited to cope with the I/O bound analysis jobs. The files are copied from Grid storage elements (using GridFTP), while concurrently performing computations, inspired from a similar idea used in the ATLAS experiment. The results illustrate that this file staging approach is relatively insensitive to the original location of the data, and a significant improvement can be achieved in terms of the CPU efficiency of an analysis job. Dealing with scalability of such a solution on the Grid environment is discussed briefly.

Original languageEnglish
Article number72060
JournalJournal of Physics : Conference Series
Volume331
Issue numberPART 7
DOIs
Publication statusPublished - 2011

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 8 - Decent Work and Economic Growth
    SDG 8 Decent Work and Economic Growth

Fingerprint

Dive into the research topics of 'Optimization of large scale HEP data analysis in LHCb'. Together they form a unique fingerprint.

Cite this