使用tf渐变更新keras模型

时间:2018-06-11 22:28:00

标签: tensorflow keras python-3.6

我正在尝试在keras中构建a3c实现。我有使用keras的经验,但绝对没有使用tensorflow的经验。所以如果有人能够尽可能地让它变得简单,我真的会感到很难过,因为我想尽可能快地完成它而不会过度深入到tensorflow。

    self.session = tf.Session()
    K.set_session(self.session)
    K.manual_variable_initialization(True)
    self.stop_signal = False

    self.model = self._build_model()
    self.graph = self._build_graph(self.model)

    self.session.run(tf.global_variables_initializer())
    self.default_graph = tf.get_default_graph()

    self.default_graph.finalize()    # avoid modifications

def _build_model(self):

    l_input = Input(batch_shape=(None, NUM_STATE))
    input_layer = Reshape((1, -1))(l_input)

    lstm = LSTM(64, activation='relu', return_sequences=True)(input_layer)
    lstm = LSTM(128, activation='relu', return_sequences=True)(lstm)
    lstm = LSTM(128, activation='relu')(lstm)

    out_actions = Dense(NUM_ACTIONS, activation='softmax')(lstm)
    out_value = Dense(1, activation='linear')(lstm)

    model = Model(inputs=[l_input], outputs=[out_actions, out_value])
    model._make_predict_function()  # have to initialize before threading

    return model

def _build_graph(self, model):
    s_t = tf.placeholder(tf.float32, shape=(None, NUM_STATE))
    a_t = tf.placeholder(tf.float32, shape=(None, NUM_ACTIONS))
    r_t = tf.placeholder(tf.float32, shape=(None, 1))

    p, v = model(s_t)

    log_prob = tf.log(tf.reduce_sum(p * a_t, axis=1, keepdims=True) + 1e-10)
    advantage = r_t - v

    loss_policy = - log_prob * tf.stop_gradient(advantage)
    loss_value = LOSS_V * tf.square(advantage)
    entropy = LOSS_ENTROPY * tf.reduce_sum(p * tf.log(p + 1e-10), axis=1, keepdims=True)

    loss_total = tf.reduce_mean(loss_policy + loss_value + entropy)
    optimizer = tf.train.AdamOptimizer(LEARNING_RATE)
    minimize = optimizer.minimize(loss_total)

    return s_t, a_t, r_t, minimize

然后它正在接受培训:

s_t, a_t, r_t, minimize = self.graph
self.session.run(minimize, feed_dict={s_t: s, a_t: a, r_t: r})

预测以这种方式完成:

with self.default_graph.as_default():
    p, v = self.model.predict(s)

所以我想在完成训练后使用这些渐变更新我的keras模型权重,以便使用model.save('path.h5')保存它。 Peudo代码:

model_weights = model.trainable_weights
model_weights = apply_gradients(grades, model_weights)
model = model.set_weights(model_weights)
model.save('path.h5')

代码来自这里,几乎没有变化:https://github.com/jaara/AI-blog/blob/master/CartPole-A3C.py

我在这个主题上找到了一些东西但却无法弄清楚如何实际使用它。 https://github.com/keras-team/keras/issues/3062
https://github.com/keras-team/keras/issues/3069

1 个答案:

答案 0 :(得分:0)

事实证明,问题与算法收敛不正确有关。如果有人知道该怎么做才能使其收敛?我使用的是自定义环境,过去我曾在此环境上培训过DQN,它已成功收敛。我还实现了目标模型,该模型每300步更新一次(在本例中为1集)。