Reinforcement Learning with History Lists

Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen:
Titel: Reinforcement Learning with History Lists
Autor(en): Timmer, Stephan
Erstgutachter: Prof. Dr. Martin Riedmiller
Zweitgutachter: Prof. Dr. Kai-Uwe Kühnberger
Zusammenfassung: A very general framework for modeling uncertainty in learning environments is given by Partially Observable Markov Decision Processes (POMDPs). In a POMDP setting, the learning agent infers a policy for acting optimally in all possible states of the environment, while receiving only observations of these states. The basic idea for coping with partial observability is to include memory into the representation of the policy. Perfect memory is provided by the belief space, i.e. the space of probability distributions over environmental states. However, computing policies defined on the belief space requires a considerable amount of prior knowledge about the learning problem and is expensive in terms of computation time. In this thesis, we present a reinforcement learning algorithm for solving deterministic POMDPs based on short-term memory. Short-term memory is implemented by sequences of past observations and actions which are called history lists. In contrast to belief states, history lists are not capable of representing optimal policies, but are far more practical and require no prior knowledge about the learning problem. The algorithm presented learns policies consisting of two separate phases. During the first phase, the learning agent collects information by actively establishing a history list identifying the current state. This phase is called the efficient identification strategy. After the current state has been determined, the Q-Learning algorithm is used to learn a near optimal policy. We show that such a procedure can be also used to solve large Markov Decision Processes (MDPs). Solving MDPs with continuous, multi-dimensional state spaces requires some form of abstraction over states. One particular way of establishing such abstraction is to ignore the original state information, only considering features of states. This form of state abstraction is closely related to POMDPs, since features of states can be interpreted as observations of states.
Schlagworte: Reinforcement Learning; POMDP; State Abstraction; Short-Term Memory
Erscheinungsdatum: 13-Mär-2009
Enthalten in den Sammlungen:FB06 - E-Dissertationen

Dateien zu dieser Ressource:
Datei Beschreibung GrößeFormat 
E-Diss873_thesis.pdfPräsentationsformat1,06 MBAdobe PDFMiniaturbild

Alle Ressourcen im repOSitorium sind urheberrechtlich geschützt, soweit nicht anderweitig angezeigt.