Loading...
Please wait, while we are loading the content...
Similar Documents
Automatic induction of maxq hierarchies.
| Content Provider | CiteSeerX |
|---|---|
| Author | Wynkoop, Mike Ray, Soumya Dietterich, Tom Tadepalli, Prasad Mehta, Neville |
| Abstract | Scaling up reinforcement learning to large domains requires leveraging the structure in the domain. Hierarchical reinforcement learning has been one of the ways in which the domain structure is exploited to constrain the value function space of the learner, and speed up learning[10, 3, 1]. In the MAXQ framework, for example, a task hierarchy is defined, and a set of relevant features to represent the completion function for each task-subtask pair are given [3], resulting in decomposed subtask-specific value functions that are easier to learn than the global value function. The MAXQ decomposition facilitates learning separate value functions for subtasks. The task hierarchy is represented as a directed acyclic graph. The leaf nodes are the primitive subtasks. Each composite subtask defines a semi-Markov Decision Process (SMDP) with a set of actions (which may include primitive actions or other subtasks), a set of state variables, a termination predicate which defines a set of exit states for the subtask, and a pseudo-reward function defined over the exits. Several researchers have focused on the problem of automatically inducing temporally extended actions and task-subtask hierarchies [4, 7, 8, 9, 2, 11, 6, 5]. Discovering tasksubtask |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Primitive Subtasks Domain Structure Semi-markov Decision Process Value Function Space Termination Predicate Task-subtask Hierarchy Composite Subtask Directed Acyclic Graph Large Domain Task Hierarchy Separate Value Function Relevant Feature Leaf Node Several Researcher Task-subtask Pair Maxq Hierarchy Completion Function Subtask-specific Value Function Primitive Action Maxq Framework Global Value Function Automatic Induction State Variable Pseudo-reward Function Hierarchical Reinforcement Learning Maxq Decomposition Facilitates Exit State |
| Content Type | Text |