NDLI: Continuous-action reinforcement learning with fast policy search and adaptive basis function selection

Content Provider	Springer Nature Link
Author	Xu, Xin Liu, Chunming Hu, Dewen
Copyright Year	2010
Abstract	As an important approach to solving complex sequential decision problems, reinforcement learning (RL) has been widely studied in the community of artificial intelligence and machine learning. However, the generalization ability of RL is still an open problem and it is difficult for existing RL algorithms to solve Markov decision problems (MDPs) with both continuous state and action spaces. In this paper, a novel RL approach with fast policy search and adaptive basis function selection, which is called Continuous-action Approximate Policy Iteration (CAPI), is proposed for RL in MDPs with both continuous state and action spaces. In CAPI, based on the value functions estimated by temporal-difference learning, a fast policy search technique is suggested to search for optimal actions in continuous spaces, which is computationally efficient and easy to implement. To improve the generalization ability and learning efficiency of CAPI, two adaptive basis function selection methods are developed so that sparse approximation of value functions can be obtained efficiently both for linear function approximators and kernel machines. Simulation results on benchmark learning control tasks with continuous state and action spaces show that the proposed approach not only can converge to a near-optimal policy in a few iterations but also can obtain comparable or even better performance than Sarsa-learning, and previous approximate policy iteration methods such as LSPI and KLSPI.
Starting Page	1055
Ending Page	1070
Page Count	16
File Format	PDF
ISSN	14327643
Journal	Soft Computing
Volume Number	15
Issue Number	6
e-ISSN	14337479
Language	English
Publisher	Springer-Verlag
Publisher Date	2010-03-28
Publisher Place	Berlin, Heidelberg
Access Restriction	One Nation One Subscription (ONOS)
Subject Keyword	Reinforcement learning Approximate policy iteration Markov decision processes Learning control Generalization Artificial Intelligence (incl. Robotics) Mathematical Logic and Foundations Computational Intelligence Control , Robotics, Mechatronics
Content Type	Text
Resource Type	Article
Subject	Theoretical Computer Science Software Geometry and Topology

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

Focus continuous-action reinforcement learning with fast policy search and adaptive basis function selection (2010).

Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach

Learning a robot controller using an adaptive hierarchical fuzzy rule-based system

A supervised Actor–Critic approach for adaptive cruise control

Fast Marching-based globally stable motion learning

Self-adjusting harmony search-based feature selection

Multi-objective multiagent credit assignment in reinforcement learning and NSGA-II

Dynamic selection of evolutionary operators based on online learning and fitness landscape analysis

Policy iteration for bounded-parameter POMDPs

Continuous-action reinforcement learning with fast policy search and adaptive basis function selection

Similar Documents

Focus continuous-action reinforcement learning with fast policy search and adaptive basis function selection (2010).

Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach

Learning a robot controller using an adaptive hierarchical fuzzy rule-based system

A supervised Actor–Critic approach for adaptive cruise control

Fast Marching-based globally stable motion learning

Self-adjusting harmony search-based feature selection

Multi-objective multiagent credit assignment in reinforcement learning and NSGA-II

Dynamic selection of evolutionary operators based on online learning and fitness landscape analysis

Policy iteration for bounded-parameter POMDPs

Continuous-action reinforcement learning with fast policy search and adaptive basis function selection