TY - GEN
T1 - Reinforcement learning as heuristic for action-rule preferences
AU - Broekens, Joost
AU - Hindriks, Koen
AU - Wiggers, Pascal
PY - 2012/4/9
Y1 - 2012/4/9
N2 - A common action selection mechanism used in agent-oriented programming is to base action selection on a set of rules. Since rules need not be mutually exclusive, agents are often underspecified. This means that the decision-making of such agents leaves room for multiple choices of actions. Underspecification implies there is potential for improvement or optimalization of the agent's behavior. Such optimalization, however, is not always naturally coded using BDI-like agent concepts. In this paper, we propose an approach to exploit this potential for improvement using reinforcement learning. This approach is based on learning rule priorities to solve the rule-selection problem, and we show that using this approach the behavior of an agent is significantly improved. Key here is the use of a state representation that combines the set of rules of the agent with a domain-independent heuristic based on the number of active goals. Our experiments show that this provides a useful generic base for learning while avoiding the state-explosion problem or overfitting.
AB - A common action selection mechanism used in agent-oriented programming is to base action selection on a set of rules. Since rules need not be mutually exclusive, agents are often underspecified. This means that the decision-making of such agents leaves room for multiple choices of actions. Underspecification implies there is potential for improvement or optimalization of the agent's behavior. Such optimalization, however, is not always naturally coded using BDI-like agent concepts. In this paper, we propose an approach to exploit this potential for improvement using reinforcement learning. This approach is based on learning rule priorities to solve the rule-selection problem, and we show that using this approach the behavior of an agent is significantly improved. Key here is the use of a state representation that combines the set of rules of the agent with a domain-independent heuristic based on the number of active goals. Our experiments show that this provides a useful generic base for learning while avoiding the state-explosion problem or overfitting.
KW - Agent-oriented programming
KW - reinforcement learning
KW - rule preferences
UR - http://www.scopus.com/inward/record.url?scp=84859354746&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84859354746&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-28939-2_2
DO - 10.1007/978-3-642-28939-2_2
M3 - Conference contribution
AN - SCOPUS:84859354746
SN - 9783642289385
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 25
EP - 40
BT - Programming Multi-Agent Systems - 8th International Workshop, ProMAS 2010, Revised Selected Papers
T2 - 8th International Workshop on Programming Multi-Agent Systems, ProMAS 2010
Y2 - 11 May 2010 through 11 May 2010
ER -