我正在尝试在keras中构建a3c实现。我有使用keras的经验,但绝对没有使用tensorflow的经验。所以如果有人能够尽可能地让它变得简单,我真的会感到很难过,因为我想尽可能快地完成它而不会过度深入到tensorflow。
self.session = tf.Session()
K.set_session(self.session)
K.manual_variable_initialization(True)
self.stop_signal = False
self.model = self._build_model()
self.graph = self._build_graph(self.model)
self.session.run(tf.global_variables_initializer())
self.default_graph = tf.get_default_graph()
self.default_graph.finalize() # avoid modifications
def _build_model(self):
l_input = Input(batch_shape=(None, NUM_STATE))
input_layer = Reshape((1, -1))(l_input)
lstm = LSTM(64, activation='relu', return_sequences=True)(input_layer)
lstm = LSTM(128, activation='relu', return_sequences=True)(lstm)
lstm = LSTM(128, activation='relu')(lstm)
out_actions = Dense(NUM_ACTIONS, activation='softmax')(lstm)
out_value = Dense(1, activation='linear')(lstm)
model = Model(inputs=[l_input], outputs=[out_actions, out_value])
model._make_predict_function() # have to initialize before threading
return model
def _build_graph(self, model):
s_t = tf.placeholder(tf.float32, shape=(None, NUM_STATE))
a_t = tf.placeholder(tf.float32, shape=(None, NUM_ACTIONS))
r_t = tf.placeholder(tf.float32, shape=(None, 1))
p, v = model(s_t)
log_prob = tf.log(tf.reduce_sum(p * a_t, axis=1, keepdims=True) + 1e-10)
advantage = r_t - v
loss_policy = - log_prob * tf.stop_gradient(advantage)
loss_value = LOSS_V * tf.square(advantage)
entropy = LOSS_ENTROPY * tf.reduce_sum(p * tf.log(p + 1e-10), axis=1, keepdims=True)
loss_total = tf.reduce_mean(loss_policy + loss_value + entropy)
optimizer = tf.train.AdamOptimizer(LEARNING_RATE)
minimize = optimizer.minimize(loss_total)
return s_t, a_t, r_t, minimize
然后它正在接受培训:
s_t, a_t, r_t, minimize = self.graph
self.session.run(minimize, feed_dict={s_t: s, a_t: a, r_t: r})
预测以这种方式完成:
with self.default_graph.as_default():
p, v = self.model.predict(s)
所以我想在完成训练后使用这些渐变更新我的keras模型权重,以便使用model.save('path.h5')保存它。 Peudo代码:
model_weights = model.trainable_weights
model_weights = apply_gradients(grades, model_weights)
model = model.set_weights(model_weights)
model.save('path.h5')
代码来自这里,几乎没有变化:https://github.com/jaara/AI-blog/blob/master/CartPole-A3C.py
我在这个主题上找到了一些东西但却无法弄清楚如何实际使用它。
https://github.com/keras-team/keras/issues/3062
https://github.com/keras-team/keras/issues/3069
答案 0 :(得分:0)
事实证明,问题与算法收敛不正确有关。如果有人知道该怎么做才能使其收敛?我使用的是自定义环境,过去我曾在此环境上培训过DQN,它已成功收敛。我还实现了目标模型,该模型每300步更新一次(在本例中为1集)。