NDLI: Reinforcement Learning with Replacing Eligibility Traces Figure 1. Accumulating and Replacing Eligibility Traces. Reinforcement Learning with Replacing Eligibility Traces 2. Td() and Monte Carlo Prediction Methods 2.2. Monte Carlo Algorithms Reinforcement Learning with Replacing Eligibility Traces 5

Please wait, while we are loading the content...

Reinforcement Learning with Replacing Eligibility Traces Figure 1. Accumulating and Replacing Eligibility Traces. Reinforcement Learning with Replacing Eligibility Traces 2. Td() and Monte Carlo Prediction Methods 2.2. Monte Carlo Algorithms Reinforcement Learning with Replacing Eligibility Traces 5

Content Provider	Semantic Scholar
Author	Singh, Satinder P.
Copyright Year	1996
Abstract	The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze it theoretically, and show that it results in faster, more reliable learning than the conventional trace. Both kinds of trace assign credit to prior events according to how recently they occurred, but only the conventional trace gives greater credit to repeated events. Our analysis is for conventional and replace-trace versions of the ooine TD(1) algorithm applied to undiscounted absorbing Markov chains. First, we show that these methods converge under repeated presentations of the training set to the same predictions as two well known Monte Carlo methods. We then analyze the relative eeciency of the two Monte Carlo methods. We show that the method corresponding to conventional TD is biased, whereas the method corresponding to replace-trace TD is unbiased. In addition, we show that the method corresponding to replacing traces is closely related to the maximum likelihood solution for these tasks, and that its mean squared error is always lower in the long run. Computational results connrm these analyses and show that they are applicable more generally. In particular, we show that replacing traces sig-niicantly improve performance and reduce parameter sensitivity on the \Mountain-Car" task, a full reinforcement-learning problem with a continuous state space, when using a feature-based function approximator. Two fundamental mechanisms have been used in reinforcement learning to handle delayed reward. One is temporal-diierence (TD) learning, as in the TD() algorithm (Sutton, 1988) and in Q-learning (Watkins, 1989). TD learning in eeect constructs an internal reward signal that is less delayed than the original, external one. However, TD methods can eliminate the delay completely only on fully Markov problems, which are rare in practice. In most problems some delay always remains between an action and its eeective reward, and on all problems some delay is always present during the time before TD learning is complete. Thus, there is 2 a general need for a second mechanism to handle whatever delay is not eliminated by TD learning. The second mechanism that has been widely used for this is the eligibility trace. 1 Introduced by Klopf (1972), eligibility traces have been used in a variety of reinforcement learning systems (e. Systematic empirical studies of eligibility traces in conjunction with TD methods were made by Sutton (1984), and theoretical results …
File Format	PDF HTM / HTML
Language	English
Access Restriction	Open
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in