Loading...
Please wait, while we are loading the content...
Similar Documents
Behavior Policy Gradient Supplemental Material
| Content Provider | Semantic Scholar |
|---|---|
| Author | Hanna, Josiah P. Thomas, Philip S. Stone, Peter Niekum, Scott |
| Copyright Year | 2017 |
| Abstract | A. Proof of Theorem 1 In Appendix A, we give the full derivation of our primary theoretical contribution — the importance-sampling (IS) variance gradient. We also present the variance gradient for the doubly-robust (DR) estimator. We first derive an analytic expression for the gradient of the variance of an arbitrary, unbiased off-policy policy evaluation estimator, OPE(H,θ). Importance-sampling is one such off-policy policy evaluation estimator. From our general derivation we derive the gradient of the variance of the IS estimator and then extend to the DR estimator. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://proceedings.mlr.press/v70/hanna17a/hanna17a-supp.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |