Yang, Z. orcid.org/0000-0002-7908-9524, Guo, S., Fang, Y. et al. (2 more authors) (2024) Spiking Variational Policy Gradient for Brain Inspired Reinforcement Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence. ISSN 0162-8828
Abstract
Recent studies in reinforcement learning have explored brain-inspired function approximators and learning algorithms to simulate brain intelligence and adapt to neuromorphic hardware. Among these approaches, reward-modulated spike-timing-dependent plasticity (R-STDP) is biologically plausible and energy-efficient, but suffers from a gap between its local learning rules and the global learning objectives, which limits its performance and applicability. In this paper, we design a recurrent winner-take-all network and propose the spiking variational policy gradient (SVPG), a new R-STDP learning method derived theoretically from the global policy gradient. Specifically, the policy inference is derived from an energy-based policy function using mean-field inference, and the policy optimization is based on a last-step approximation of the global policy gradient. These fill the gap between the local learning rules and the global target. In experiments including a challenging ViZDoom vision-based navigation task and two realistic robot control tasks, SVPG successfully solves all the tasks. In addition, SVPG exhibits better inherent robustness to various kinds of input, network parameters, and environmental perturbations than compared methods.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Keywords: | Reinforcement learning; reward-modulated spike-timing-dependent plasticity; spiking neural networks; variational policy gradient; winner-take-all circuit |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 31 Jan 2025 11:34 |
Last Modified: | 31 Jan 2025 11:42 |
Status: | Published online |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
Identification Number: | 10.1109/tpami.2024.3511936 |
Sustainable Development Goals: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:222690 |