我正在尝试构建一个简单的强化学习代理,以从外部获得奖励。我不知道如何从外部奖励中优化可训练的参数。
在here上发布了类似的问题,但没有被接受的答案。
这是我的代码的准系统版本
# a simple agent
a = tf.placeholder(dtype=tf.float32, shape=(), name='a')
b = tf.placeholder(dtype=tf.float32, shape=(), name='b')
w1 = tf.Variable(2.0, name='w1')
w2 = tf.Variable(2.0, name='w2')
y = w1*a + w2*b
# get and scale reward
scale = tf.constant(100.0)
reward = tf.placeholder(shape=(), dtype=tf.float32, name='reward')
loss = scale * (1-reward)
# optimize
optimizer = tf.train.AdamOptimizer()
trainable_params = tf.trainable_variables()
gradients = tf.gradients(loss, trainable_params)
train_op = optimizer.apply_gradients(zip(gradients, trainable_params))
def get_reward(action):
return action/500.
with tf.Session() as s:
s.run(tf.global_variables_initializer())
for i in range(1000):
yp = s.run(y, feed_dict={ a: 5.0, b: 10.0 }) # get action
tf_loss, _ = s.run([loss, train_op], feed_dict={
a: 5.0, b: 10.0,
reward: get_reward(yp)
})
if i % 100 == 0: print('loss: {}'.format(tf_loss))
由于奖励/损失不属于图表,因此出现以下错误
Traceback (most recent call last):
File "tf-rl.py", line 20, in <module>
train_op = optimizer.apply_gradients(zip(gradients, trainable_params))
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/optimizer.py", line 593, in apply_gradients
([str(v) for _, v, _ in converted_grads_and_vars],))
ValueError: No gradients provided for any variable: ["<tf.Variable 'w1:0' shape=() dtype=float32_ref>", "<tf.Variable 'w2:0' shape=() dtype=float32_ref>"].