Loading...
Please wait, while we are loading the content...
Similar Documents
Two Heuristics for Solving POMDPs Having a Delayed Need to Observe
| Content Provider | CiteSeerX |
|---|---|
| Author | Zubek, Valentina Bayer Dietterich, Thomas |
| Abstract | A common heuristic for solving Partially Observable Markov Decision Problems (POMDPs) is to first solve the underlying Markov Decision Process (MDP) and then construct a POMDP policy by performing a fixeddepth lookahead search in the POMDP and evaluating the leaf nodes using the MDP value function. A problem with this approximation is that it does not account for the need to choose actions in order to gain information about the state of the world, particularly when those observation actions are needed at some point in the future. This paper proposes two heuristics that are better than the MDP approximation in POMDPs where there is a delayed need to observe. The first approximation, introduced in [ 2 ] , is the even-odd POMDP, in which the world is assumed to be fully observable every other time step. The even-odd POMDP can be converted into an equivalent MDP, the even-MDP, whose value function captures some of the sensing costs of the original POMDP. An online policy, consisting in a 2-step lookahead search combined with the value function of the even-MDP, gives an approximation to the POMDP's value function that is at least as good as the method based on the value function of the underlying MDP. The second POMDP approximation is applicable to a special kind of POMDP which we call the Cost Observable Markov Decision Problem (COMDP). In a COMDP, the actions are partitioned into those that change the state of the world and those that are pure observation actions. For such problems, we describe the chain-MDP algorithm, which in many cases is able to capture more of the sensing costs than the even-odd POMDP approximation. We prove that both heuristics compute value functions that are upper bounded by (i.e., better than) the value func... |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Even-odd Pomdp Even-odd Pomdp Approximation Common Heuristic Value Func Mdp Value Function Observation Action Original Pomdp Underlying Mdp Equivalent Mdp 2-step Lookahead Search Chain-mdp Algorithm Time Step Online Policy Sensing Cost Mdp Approximation Second Pomdp Approximation Delayed Need Partially Observable Markov Decision Problem Pomdp Policy Pure Observation Action Cost Observable Markov Decision Problem Underlying Markov Decision Process Fixeddepth Lookahead Search First Approximation Value Function Special Kind |
| Content Type | Text |
| Resource Type | Article |