One of the most urgent challenges in Reinforcement Learning research is the lack of reproducibility. Therefore, to further the understanding of the training behavior of Reinforcement Learning agents, we analyze the training of agents playing the established baseline environment Taxi. In particular, we contrast results based on different forms of exploration. In addition, we can demonstrate that in this context penalization without termination is to be the preferred punishment for incorrect actions.
«
One of the most urgent challenges in Reinforcement Learning research is the lack of reproducibility. Therefore, to further the understanding of the training behavior of Reinforcement Learning agents, we analyze the training of agents playing the established baseline environment Taxi. In particular, we contrast results based on different forms of exploration. In addition, we can demonstrate that in this context penalization without termination is to be the preferred punishment for incorrect actio...
»