应用错误收集

训练后如何找到“ Humanoid-v2”代理行进的距离？

时间：2018-12-11 19:50:09

标签： xml reinforcement-learning openai-gym

我训练了“ Humanoid-v2”（https://github.com/openai/gym/wiki/Humanoid-V1）走路。培训期间，奖励会增加。但是，我还需要一个性能矩阵来告诉我座席已经走了多远？

有对该代理（https://github.com/openai/gym/wiki/Humanoid-V1）的376个观测值。 哪个值对应于代理程序XML文件的第27行中提到的位置x，y，z ：https://github.com/openai/gym/blob/master/gym/envs/mujoco/assets/humanoid.xml#L27吗？

谢谢

1 个答案:

答案 0 :(得分：1)

让初始位置为（x1，y1，z1），将步骤后的位置为（x2，y2，z2）
代理在步骤中行进的距离如下：

dist = tf.add（tf.squared_difference（x2，x1），tf.squared_difference（y2，y1），tf.squared_difference （z2，z1））

总结到情节结束为止的记忆距离