Question

我正在尝试构建一个简单的强化学习代理，以从外部获得奖励。我不知道如何从外部奖励中优化可训练的参数。

在here上发布了类似的问题，但没有被接受的答案。

这是我的代码的准系统版本

# a simple agent
a = tf.placeholder(dtype=tf.float32, shape=(), name='a')
b = tf.placeholder(dtype=tf.float32, shape=(), name='b')
w1 = tf.Variable(2.0, name='w1')
w2 = tf.Variable(2.0, name='w2')
y = w1*a + w2*b

# get and scale reward
scale = tf.constant(100.0)
reward = tf.placeholder(shape=(), dtype=tf.float32, name='reward')
loss = scale * (1-reward)

# optimize
optimizer = tf.train.AdamOptimizer()
trainable_params = tf.trainable_variables()
gradients = tf.gradients(loss, trainable_params)
train_op = optimizer.apply_gradients(zip(gradients, trainable_params))

def get_reward(action):
    return action/500.

with tf.Session() as s:
    s.run(tf.global_variables_initializer())
    for i in range(1000):
        yp = s.run(y, feed_dict={ a: 5.0, b: 10.0 })    # get action

        tf_loss, _ = s.run([loss, train_op], feed_dict={ 
            a: 5.0, b: 10.0, 
            reward: get_reward(yp) 
        })

        if i % 100 == 0: print('loss: {}'.format(tf_loss))

由于奖励/损失不属于图表，因此出现以下错误

Traceback (most recent call last):
  File "tf-rl.py", line 20, in <module>
    train_op = optimizer.apply_gradients(zip(gradients, trainable_params))
  File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/optimizer.py", line 593, in apply_gradients
    ([str(v) for _, v, _ in converted_grads_and_vars],))
ValueError: No gradients provided for any variable: ["<tf.Variable 'w1:0' shape=() dtype=float32_ref>", "<tf.Variable 'w2:0' shape=() dtype=float32_ref>"].

将外部损失输入张量流图并计算梯度

0 个答案: