我基本上一直在尝试一个简单的q学习方法,以尝试更好地使用ML,但是我有一个非常简单的问题,我无法解决,
for i in range(runs * amt_per_step):
done = None
didgood = None
newstate = None
lastq = None
results = None
starter = env.start()[0]
render = False
if i % 50 == 0:
render = True
if i == 0:
pass
if i == 1:
action = np.argmax(get_discrete_state(starter)) + 1
a, b, c, d, e = env.step(starter, action)
done = a
didgood = b
newstate = c
lastq = d
results = e
if didgood == False:
reward = -1
else:
reward = 0
new_q = (1 - learning_rate) * lastq + learning_rate * (reward + discount * 3)
应该是step()函数为其分配的内容,而不是newstate和所有其他变量为None