Question

我正在加强学习计划，我正在使用这篇文章作为reference。我正在使用python与keras（theano）创建神经网络，我正在使用的伪代码是

Do a feedforward pass for the current state s to get predicted Q-values for all actions.

Do a feedforward pass for the next state s’ and calculate maximum overall network outputs max a’ Q(s’, a’).

Set Q-value target for action to r + γmax a’ Q(s’, a’) (use the max calculated in step 2). For all other actions, set the Q-value target to the same as originally returned from step 1, making the error 0 for those outputs.

Update the weights using backpropagation.

此处的损失函数方程是

我的奖励为+1，maxQ（s＆＃39;，a＆＃39;）= 0.8375，Q（s，a）= 0.6892

我的L将是1/2*(1+0.8375-0.6892)^2=0.659296445

现在如果我的模型结构是这样的话，如何使用上面的损失函数值更新我的模型神经网络权重

model = Sequential()
model.add(Dense(150, input_dim=150))
model.add(Dense(10))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='mse', optimizer='adam')

Answer 1

假设NN正在为Q值函数建模，您只需将目标传递给网络即可。 e.g。

model.train_on_batch(state_action_vector, target)

其中state_action_vector是一些预处理向量，表示网络的状态 - 动作输入。由于您的网络正在使用MSE损失函数，它将使用正向传递上的状态操作计算预测项，然后根据您的目标更新权重。

如何更新keras中的权重以进行强化学习？

1 个答案: