Python

TD Learning model simulation

By TD learning, the agent learns from future rewards and back-propagates prediction errors by updating estimation values(keep updating beliefs of future rewards at every moment approaching the future). It is one of the core concepts of model-free reinforcement learning.