NDLI: Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces

Please wait, while we are loading the content...

Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces

Content Provider	Semantic Scholar
Author	Pas, Andreas Ten
Copyright Year	2012
Abstract	Many problems in Artificial Intelligence and Reinforcement Learning assume that the environment of an agent is fully observable. Imagine, for instance, a robot that moves autonomously through a hallway by employing a number of actuators and that perceives its environment through a number of sensors. As long as the sensors provide reliable information about the state of the environment, the agent can base its decisions directly on the state of the environment. In many domains, however, sensors can only provide partial information about the state of the environment. A mathematical model to deal with such domains is the partially observable Markov decision process (POMDP). Offline planning algorithms for POMDPs are known to arrive at optimal policies for POMDPs where the state space, the action space, the observation space and the planning horizon are all finite. However, these exact algorithms turn out to be computationally very expensive. As an alternative to offline planning, online planning algorithms try to overcome those computational costs. Monte Carlo Tree Search (MCTS), an online planning algorithm, has recently been successfully extended to POMDPs. The observations recorded by sensors such as laser range finders and radars are typically continuous. Such systems can be modeled by POMDPs. This thesis deals with the problem of extending MCTS to POMDPs with continuous observations. To tackle this problem, a novel online planning algorithm, called Continuous Observations Monte Carlo Tree Search (COMCTS), is proposed. The algorithm combines Monte Carlo Tree Search with an incremental regression tree learning technique that builds discretizations of the observation space and allows to incorporate the observations into the search tree. Belief based Function Approximation using Incremental Regression tree techniques and Monte Carlo (BFAIRMC) is another novel online planning algorithm presented in this thesis. The algorithm combines Monte Carlo rollouts with an incremental regression tree learning technique that creates a discretization of the belief space and allows to base the agent’s action selection directly on specific regions of the belief space. For both novel algorithms, several strategies to avoid information loss in the case of a refinement of a discretization are discussed. These strategies evolve from loosing all information over loosing partial information to loosing no information. An evaluation of the algorithms on a set of benchmark problems shows that COMCTS performs significantly better than a random algorithm but only slightly better than uniform Monte Carlo rollouts. BFAIRMC shows a clear improvement of over uniform Monte Carlo rollouts per sample but the algorithm turns out to be relatively slow in practice. The evaluation also indicates that the strategies to avoid information loss directly improve the performance of BFAIRMC per sample, but surprisingly do not improve the performance of COMCTS. Preface When I started thinking about the topic for my master’s thesis, I was still focused on a topic related to game theory. During my considerations, I kind of accidentally ran into the topic of reusing knowledge in Tree Learning Search, a recent extension of Monte Carlo Tree Search to continuous actions. I had some experience with Monte Carlo Tree Search through a project during my master’s study and got to like the ideas behind the algorithm is a lot. This planning was still during my semester abroad. So, when I came back, I contacted the inventors of this topic, but unfortunately, the topic and a similar topic had already been taken by two other students. That put some frustration into me and I had doubts that it was still possible to find some interesting and related topic. However, when I met with the inventors who later on became the supervisors for this thesis, they offered so many related ideas that my frustration was put aside. For me, this experience clearly underlines that a master’s thesis is not something that comes out of nowhere. Without the help of a group of people supporting me during the research, this thesis would not be where it is now. First of all, I would like to express my gratitude to my three supervisors. Frans, thank you for your enthusiasm, support and constructive discussion during our many meetings. There have been times that my research did not proceed as well as I hoped or did not show the results that I expected, or I lacked motivation or inspiration. Every time one meeting with you was enough to get me back on track. I am looking forward to continuing this cooperation with you in the future. Kurt, thank you for all your good comments and suggestions. It is always a real pleasure to work with you. Michael, thank you for all your ideas and inspirations which established the basic idea for this thesis and lead to very fascinating algorithms in the end. I also want to thank you for all the informative and often also entertaining joint meetings that you initiated. A special acknowledgment goes to my fellow students Lukas and Colin who worked on related topics for their master theses and with whom I had a lot of long and fruitful discussions. The exchange of knowledge and cooperation within our group has been on a much higher level than in any project group I had been part of during my studies. I would also like to thank the readers of earlier versions of this thesis, Veronika and Henning. Furthermore, I thank everyone who supported me and with whom I had very enjoyable times during my studies, especially my friends and family. Without you, I would not have been where I am now. I hope that you will enjoy reading this thesis as much as I nearly all of the time enjoyed the process of creating it.
File Format	PDF HTM / HTML
Alternate Webpage(s)	http://www.ccs.neu.edu/home/atp/publications/msc_thesis.pdf
Alternate Webpage(s)	http://homedirs.ccs.neu.edu/atp/publications/msc_thesis.pdf
Language	English
Access Restriction	Open
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in