Analysis of DIRAC's behavior using model checking with process algebra

D. Remenska, J.A. Templon, T.A.C. Willemse, H.E. Bal, K. Verstoep, W.J. Fokkink, P. Charpentier, R. Graciana Diaz, E. Lanciotti, S. Roiser, K. Ciba

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

DIRAC is the grid solution developed to support LHCb production activities as well as user data analysis. It consists of distributed services and agents delivering the workload to the grid resources. Services maintain database back-ends to store dynamic state information of entities such as jobs, queues, staging requests, etc. Agents use polling to check and possibly react to changes in the system state. Each agent's logic is relatively simple; the main complexity lies in their cooperation. Agents run concurrently, and collaborate using the databases as shared memory. The databases can be accessed directly by the agents if running locally or through a DIRAC service interface if necessary. This shared-memory model causes entities to occasionally get into inconsistent states. Tracing and fixing such problems becomes formidable due to the inherent parallelism present. We propose more rigorous methods to cope with this. Model checking is one such technique for analysis of an abstract model of a system. Unlike conventional testing, it allows full control over the parallel processes execution, and supports exhaustive state-space exploration. We used the mCRL2 language and toolset to model the behavior of two related DIRAC subsystems: the workload and storage management system. Based on process algebra, mCRL2 allows defining custom data types as well as functions over these. This makes it suitable for modeling the data manipulations made by DIRAC's agents. By visualizing the state space and replaying scenarios with the toolkit's simulator, we have detected race-conditions and deadlocks in these systems, which, in several cases, were confirmed to occur in the reality. Several properties of interest were formulated and verified with the tool. Our future direction is automating the translation from DIRAC to a formal model..
Original languageEnglish
Number of pages10
JournalJournal of Physics: Conference Series
Volume396
Issue number5
DOIs
Publication statusPublished - 2012

Fingerprint

algebra
grids
space exploration
management systems
tracing
fixing
simulators
logic
manipulators
resources
causes

Cite this

Remenska, D. ; Templon, J.A. ; Willemse, T.A.C. ; Bal, H.E. ; Verstoep, K. ; Fokkink, W.J. ; Charpentier, P. ; Graciana Diaz, R. ; Lanciotti, E. ; Roiser, S. ; Ciba, K. / Analysis of DIRAC's behavior using model checking with process algebra. In: Journal of Physics: Conference Series. 2012 ; Vol. 396, No. 5.
@article{933fdfb6e70344bf9e665ca463e46aae,
title = "Analysis of DIRAC's behavior using model checking with process algebra",
abstract = "DIRAC is the grid solution developed to support LHCb production activities as well as user data analysis. It consists of distributed services and agents delivering the workload to the grid resources. Services maintain database back-ends to store dynamic state information of entities such as jobs, queues, staging requests, etc. Agents use polling to check and possibly react to changes in the system state. Each agent's logic is relatively simple; the main complexity lies in their cooperation. Agents run concurrently, and collaborate using the databases as shared memory. The databases can be accessed directly by the agents if running locally or through a DIRAC service interface if necessary. This shared-memory model causes entities to occasionally get into inconsistent states. Tracing and fixing such problems becomes formidable due to the inherent parallelism present. We propose more rigorous methods to cope with this. Model checking is one such technique for analysis of an abstract model of a system. Unlike conventional testing, it allows full control over the parallel processes execution, and supports exhaustive state-space exploration. We used the mCRL2 language and toolset to model the behavior of two related DIRAC subsystems: the workload and storage management system. Based on process algebra, mCRL2 allows defining custom data types as well as functions over these. This makes it suitable for modeling the data manipulations made by DIRAC's agents. By visualizing the state space and replaying scenarios with the toolkit's simulator, we have detected race-conditions and deadlocks in these systems, which, in several cases, were confirmed to occur in the reality. Several properties of interest were formulated and verified with the tool. Our future direction is automating the translation from DIRAC to a formal model..",
author = "D. Remenska and J.A. Templon and T.A.C. Willemse and H.E. Bal and K. Verstoep and W.J. Fokkink and P. Charpentier and {Graciana Diaz}, R. and E. Lanciotti and S. Roiser and K. Ciba",
year = "2012",
doi = "10.1088/1742-6596/396/5/052061",
language = "English",
volume = "396",
journal = "Journal of Physics: Conference Series",
issn = "1742-6588",
publisher = "IOP Publishing Ltd.",
number = "5",

}

Remenska, D, Templon, JA, Willemse, TAC, Bal, HE, Verstoep, K, Fokkink, WJ, Charpentier, P, Graciana Diaz, R, Lanciotti, E, Roiser, S & Ciba, K 2012, 'Analysis of DIRAC's behavior using model checking with process algebra' Journal of Physics: Conference Series, vol. 396, no. 5. https://doi.org/10.1088/1742-6596/396/5/052061

Analysis of DIRAC's behavior using model checking with process algebra. / Remenska, D.; Templon, J.A.; Willemse, T.A.C.; Bal, H.E.; Verstoep, K.; Fokkink, W.J.; Charpentier, P.; Graciana Diaz, R.; Lanciotti, E.; Roiser, S.; Ciba, K.

In: Journal of Physics: Conference Series, Vol. 396, No. 5, 2012.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Analysis of DIRAC's behavior using model checking with process algebra

AU - Remenska, D.

AU - Templon, J.A.

AU - Willemse, T.A.C.

AU - Bal, H.E.

AU - Verstoep, K.

AU - Fokkink, W.J.

AU - Charpentier, P.

AU - Graciana Diaz, R.

AU - Lanciotti, E.

AU - Roiser, S.

AU - Ciba, K.

PY - 2012

Y1 - 2012

N2 - DIRAC is the grid solution developed to support LHCb production activities as well as user data analysis. It consists of distributed services and agents delivering the workload to the grid resources. Services maintain database back-ends to store dynamic state information of entities such as jobs, queues, staging requests, etc. Agents use polling to check and possibly react to changes in the system state. Each agent's logic is relatively simple; the main complexity lies in their cooperation. Agents run concurrently, and collaborate using the databases as shared memory. The databases can be accessed directly by the agents if running locally or through a DIRAC service interface if necessary. This shared-memory model causes entities to occasionally get into inconsistent states. Tracing and fixing such problems becomes formidable due to the inherent parallelism present. We propose more rigorous methods to cope with this. Model checking is one such technique for analysis of an abstract model of a system. Unlike conventional testing, it allows full control over the parallel processes execution, and supports exhaustive state-space exploration. We used the mCRL2 language and toolset to model the behavior of two related DIRAC subsystems: the workload and storage management system. Based on process algebra, mCRL2 allows defining custom data types as well as functions over these. This makes it suitable for modeling the data manipulations made by DIRAC's agents. By visualizing the state space and replaying scenarios with the toolkit's simulator, we have detected race-conditions and deadlocks in these systems, which, in several cases, were confirmed to occur in the reality. Several properties of interest were formulated and verified with the tool. Our future direction is automating the translation from DIRAC to a formal model..

AB - DIRAC is the grid solution developed to support LHCb production activities as well as user data analysis. It consists of distributed services and agents delivering the workload to the grid resources. Services maintain database back-ends to store dynamic state information of entities such as jobs, queues, staging requests, etc. Agents use polling to check and possibly react to changes in the system state. Each agent's logic is relatively simple; the main complexity lies in their cooperation. Agents run concurrently, and collaborate using the databases as shared memory. The databases can be accessed directly by the agents if running locally or through a DIRAC service interface if necessary. This shared-memory model causes entities to occasionally get into inconsistent states. Tracing and fixing such problems becomes formidable due to the inherent parallelism present. We propose more rigorous methods to cope with this. Model checking is one such technique for analysis of an abstract model of a system. Unlike conventional testing, it allows full control over the parallel processes execution, and supports exhaustive state-space exploration. We used the mCRL2 language and toolset to model the behavior of two related DIRAC subsystems: the workload and storage management system. Based on process algebra, mCRL2 allows defining custom data types as well as functions over these. This makes it suitable for modeling the data manipulations made by DIRAC's agents. By visualizing the state space and replaying scenarios with the toolkit's simulator, we have detected race-conditions and deadlocks in these systems, which, in several cases, were confirmed to occur in the reality. Several properties of interest were formulated and verified with the tool. Our future direction is automating the translation from DIRAC to a formal model..

U2 - 10.1088/1742-6596/396/5/052061

DO - 10.1088/1742-6596/396/5/052061

M3 - Article

VL - 396

JO - Journal of Physics: Conference Series

JF - Journal of Physics: Conference Series

SN - 1742-6588

IS - 5

ER -