Loading...
Please wait, while we are loading the content...
Similar Documents
On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient
| Content Provider | CiteSeerX |
|---|---|
| Author | Tang, Jie Abbeel, Pieter |
| Description | Likelihood ratio policy gradient methods have been some of the most successful reinforcement learning algorithms, especially for learning on physical systems. We describe how the likelihood ratio policy gradient can be derived from an importance sampling perspective. This derivation highlights how likelihood ratio methods under-use past experience by (i) using the past experience to estimate only the gradient of the expected return U(θ) at the current policy parameterization θ, rather than to obtain a more complete estimate of U(θ), and (ii) using past experience under the current policy only rather than using all past experience to improve the estimates. We present a new policy search method, which leverages both of these observations as well as generalized baselines—a new technique which generalizes commonly used baseline techniques for policy gradient methods. Our algorithm outperforms standard likelihood ratio policy gradient algorithms on several testbeds. 1 |
| File Format | |
| Language | English |
| Publisher Institution | In Advances in Neural Information Processing Systems (2010 |
| Access Restriction | Open |
| Subject Keyword | Current Policy Parameterization Policy Gradient Method Likelihood Ratio Policy Gradient Physical System Successful Reinforcement Likelihood Ratio Policy Gradient Method New Policy Search Method Several Testbeds Expected Return Importance Sampling Complete Estimate New Technique Past Experience Baseline Technique Current Policy |
| Content Type | Text |
| Resource Type | Article |