Loading...
Please wait, while we are loading the content...
Similar Documents
Using neural networks and dyna algorithm for integrated planning, reacting and learning in systems
| Content Provider | NASA Technical Reports Server (NTRS) |
|---|---|
| Author | Lima, Pedro Beard, Randal |
| Copyright Year | 1992 |
| Description | The traditional AI answer to the decision making problem for a robot is planning. However, planning is usually CPU-time consuming, depending on the availability and accuracy of a world model. The Dyna system generally described in earlier work, uses trial and error to learn a world model which is simultaneously used to plan reactions resulting in optimal action sequences. It is an attempt to integrate planning, reactive, and learning systems. The architecture of Dyna is presented. The different blocks are described. There are three main components of the system. The first is the world model used by the robot for internal world representation. The input of the world model is the current state and the action taken in the current state. The output is the corresponding reward and resulting state. The second module in the system is the policy. The policy observes the current state and outputs the action to be executed by the robot. At the beginning of program execution, the policy is stochastic and through learning progressively becomes deterministic. The policy decides upon an action according to the output of an evaluation function, which is the third module of the system. The evaluation function takes the following as input: the current state of the system, the action taken in that state, the resulting state, and a reward generated by the world which is proportional to the current distance from the goal state. Originally, the work proposed was as follows: (1) to implement a simple 2-D world where a 'robot' is navigating around obstacles, to learn the path to a goal, by using lookup tables; (2) to substitute the world model and Q estimate function Q by neural networks; and (3) to apply the algorithm to a more complex world where the use of a neural network would be fully justified. In this paper, the system design and achieved results will be described. First we implement the world model with a neural network and leave Q implemented as a look up table. Next, we use a lookup table for the world model and implement the Q function with a neural net. Time limitations prevented the combination of these two approaches. The final section discusses the results and gives clues for future work. |
| File Size | 772542 |
| Page Count | 26 |
| File Format | |
| Alternate Webpage(s) | http://archive.org/details/NASA_NTRS_Archive_19930015554 |
| Archival Resource Key | ark:/13960/t2f818g9f |
| Language | English |
| Publisher Date | 1992-08-01 |
| Access Restriction | Open |
| Subject Keyword | Cybernetics Errors Algorithms Neural Nets Autonomous Navigation Decision Making Stochastic Processes Machine Learning Planning Artificial Intelligence Sequencing Robotics Robots Ntrs Nasa Technical Reports ServerĀ (ntrs) Nasa Technical Reports Server Aerodynamics Aircraft Aerospace Engineering Aerospace Aeronautic Space Science |
| Content Type | Text |
| Resource Type | Technical Report |