Rescorla–Wagner model simulation

The Rescorla-Wagner rule is based on a simple linear prediction of the reward associated with a stimulus. R-W model captures critical aspects of Pavlovian experiment(classical conditioning).

Variables

Conditioned stimulus: $ x \in \{ 0,1 \} $.

Unconditioned stimulus: $ r\in \{ 0,1 \} $.

Associative strenth between $x$ and $r$: $ w\in \mathbb{R} $.

e.g. by hearing a tone $x$, how likely $w$ the animal thinks of cheese $r$.

Prediction error: $ \delta = r-wx $

With learning, the prediction error will gradually approach zero, meaning there will be less and eventually no more prediction error or, say, no more surprise.

Learning process

How does the associative strength change? i.e. How does animal learn that the tone and cheese are associated?

Following the Rescorla-Wagner rule - designed to minimise $\frac{1}{2} \delta^2$, associative streghth $w$ is updated by linearly adding a prediction error, adjusted by the learning rate $\alpha$, i.e. how fast the animal learn.

$$ w \leftarrow w + \alpha(r - wx)x$$

If x is always 1(i.e. there is always cheese following a tone), simplify the above equation as

$$ w \leftarrow w + \alpha(r - w)$$

PS. The learning process designed by the R-W rule is the same as stochastic gradient ascent(SGA).

The gradient of $\frac{1}{2} \delta^2$ is

$$ grad = \frac{d}{dw} \frac{1}{2} \delta^2 = \frac{d}{dw} \frac{1}{2} (r-wx)^2 = (r-wx)x$$

Update $w$ by

$$ w \leftarrow w + \alpha \times grad $$

which is exactly

$$ w \leftarrow w + \alpha(r - wx)x $$

Simulation

# R-W Model

import numpy as np
import matplotlib.pyplot as plt

trial_ind = np.array(range(50))

x_lst = np.full(len(trial_ind), 1)
r_lst = np.full(len(trial_ind), 1)
w_lst = []
delta_lst = []

# init
w_lst.append(0)

lr = 0.1

for i in range(len(trial_ind)):
    delta = r_lst[i] - w_lst[i] * x_lst[i]
    w = w_lst[i] + lr * delta
    delta_lst.append(delta)
    w_lst.append(w)

# remove init w
w_lst.pop(0)

# plot
plt.title("predition error decreases to 0")
plt.plot(trial_ind, delta_lst)

plt.title("associative strength increases to 1")
plt.plot(trial_ind, w_lst)

d019-rl-04-img1

d019-rl-04-img2

Reference

Dayan, Peter, and Laurence F. Abbott. Theoretical neuroscience: computational and mathematical modeling of neural systems. MIT press, 2005.