Learning continuous-time working memory tasks with on-policy neural reinforcement learning

Davide Zambrano*, Pieter R. Roelfsema, Sander Bohte

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

An animals’ ability to learn how to make decisions based on sensory evidence is often well described by Reinforcement Learning (RL) frameworks. These frameworks, however, typically apply to event-based representations and lack the explicit and fine-grained notion of time needed to study psychophysically relevant measures like reaction times and psychometric curves. Here, we develop and use a biologically plausible continuous-time RL scheme of CT-AuGMEnT (Continuous-Time Attention-Gated MEmory Tagging) to study these behavioural quantities. We show how CT-AuGMEnT implements on-policy SARSA learning as a biologically plausible form of reinforcement learning with working memory units using ‘attentional’ feedback. We show that the CT-AuGMEnT model efficiently learns tasks in continuous time and can learn to accumulate relevant evidence through time. This allows the model to link task difficulty to psychophysical measurements such as accuracy and reaction-times. We further show how the implementation of a separate accessory network for feedback allows the model to learn continuously, also in case of significant transmission delays between the network's feedforward and feedback layers and even when the accessory network is randomly initialized. Our results demonstrate that CT-AuGMEnT represents a fully time-continuous biologically plausible end-to-end RL model for learning to integrate evidence and make decisions.

Original languageEnglish
Pages (from-to)635-656
Number of pages22
JournalNeurocomputing
Volume461
DOIs
Publication statusPublished - 21 Oct 2021

Bibliographical note

Funding Information:
Prof. Pieter Roelfsema (M) is director of the Netherlands Institute for Neuroscience and he also heads the lab “Vision & Cognition” at this institute. Additionally, he is a part-time professor at the University of Amsterdam and also at the Free University Amsterdam. He investigates how neurons in different brain areas work together during visual cognition and he proposed the influential theory that the processing of visual stimuli occurs in different phases with different contributions of feedforward and feedback connections. Roelfsema has received many awards including the NWO VICI award and the EU ERC advanced grant.

Funding Information:
DZ is supported by NWO (Ideas grant 656.000.005, “DEVIS”), and PRR was supported by NWO (Ideas grant 656–000-002 “REASON”), and the EuropeanUnion (grant agreement 7202070 “Human Brain Project”, and ERC grant agreement 339490 “Cortic_al_gorithms”).

Publisher Copyright:
© 2021 The Author(s)

Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.

Funding

Prof. Pieter Roelfsema (M) is director of the Netherlands Institute for Neuroscience and he also heads the lab “Vision & Cognition” at this institute. Additionally, he is a part-time professor at the University of Amsterdam and also at the Free University Amsterdam. He investigates how neurons in different brain areas work together during visual cognition and he proposed the influential theory that the processing of visual stimuli occurs in different phases with different contributions of feedforward and feedback connections. Roelfsema has received many awards including the NWO VICI award and the EU ERC advanced grant. DZ is supported by NWO (Ideas grant 656.000.005, “DEVIS”), and PRR was supported by NWO (Ideas grant 656–000-002 “REASON”), and the EuropeanUnion (grant agreement 7202070 “Human Brain Project”, and ERC grant agreement 339490 “Cortic_al_gorithms”).

FundersFunder number
EU ERC
EuropeanUnion7202070
Horizon 2020 Framework Programme945539
European Research Council339490
Nederlandse Organisatie voor Wetenschappelijk Onderzoek656.000.005, 656–000-002

    Keywords

    • Continuous-time SARSA
    • Neural networks
    • Reinforcement learning
    • Selective attention
    • Working memory

    Fingerprint

    Dive into the research topics of 'Learning continuous-time working memory tasks with on-policy neural reinforcement learning'. Together they form a unique fingerprint.

    Cite this