Loading...
Please wait, while we are loading the content...
Similar Documents
Dynamic Preferences in Multi-Criteria Reinforcement Learning (2005)
| Content Provider | CiteSeerX |
|---|---|
| Author | Natarajan, Sriraam Tadepalli, Prasad |
| Description | The current framework of reinforcement learning is based on maximizing the expected returns based on scalar rewards. But in many real world situations, tradeoffs must be made among multiple objectives. Moreover, the agent’s preferences between different objectives may vary with time. In this paper, we consider the problem of learning in the presence of time-varying preferences among multiple objectives, using numeric weights to represent their importance. We propose a method that allows us to store a finite number of policies, choose an appropriate policy for any weight vector and improve upon it. The idea is that although there are infinitely many weight vectors, they may be well-covered by a small number of optimal policies. We show this empirically in two domains: a version of the Buridan’s ass problem and network routing. 1. |
| File Format | |
| Language | English |
| Publisher Date | 2005-01-01 |
| Publisher Institution | In Proceedings of ICML-05 |
| Access Restriction | Open |
| Subject Keyword | Numeric Weight Dynamic Preference Scalar Reward Different Objective Many Weight Vector Appropriate Policy Reinforcement Learning Finite Number Optimal Policy Small Number Multi-criteria Reinforcement Learning Multiple Objective Network Routing Agent Preference Many Real World Situation Current Framework Weight Vector Buridan As Problem Time-varying Preference |
| Content Type | Text |
| Resource Type | Article |