Reinforcement Learning

TD Learning model simulation

By TD learning, the agent learns from future rewards and back-propagates prediction errors by updating estimation values(keep updating beliefs of future rewards at every moment approaching the future). It is one of the core concepts of model-free reinforcement learning.

强化学习,多巴胺,神经影像

在喂食之前响铃,久而久之,狗会将铃声和食物联系起来,听到铃声时立即分泌唾液。通过食物(强化物 reinforcer),铃声和唾液分泌之间形成联系(association/contingent),且逐渐强化(reinforce),形成条件反射。