Question

我在自定义环境中使用RLLib的PPOTrainer，我执行了两次trainer.train()，第一个成功完成，但是第二次执行时它崩溃并出现错误：

lib / python3.7 / site-packages / tensorflow_core / python / client / session.py”，   第1384行，在_do_call（pid = 15248）中，提高type（e）（node_def，op，   消息）（pid = 15248）

tensorflow.python.framework.errors_impl.InvalidArgumentError：

收到标签值5，该值超出[0，5）的有效范围。 >标签值：5 5

（pid = 15248）[[节点   default_policy / SparseSoftmaxCrossEntropyWithLogits / SparseSoftmaxCrossEntropyWithLogits   （在/tensorflow_core/python/framework/ops.py:1751定义）]]

这是我的代码：

main.py

ModelCatalog.register_custom_preprocessor("tree_obs_prep", TreeObsPreprocessor)
ray.init()

trainer = PPOTrainer(env=MyEnv, config={
    "train_batch_size": 4000,
    "model": {
        "custom_preprocessor": "tree_obs_prep"
    }
})

for i in range(2):
    print(trainer.train())

MyEnv.py

class MyEnv(rllib.env.MultiAgentEnv):
    def __init__(self, env_config):
        self.n_agents = 2

        self.env = *CREATES ENV*
        self.action_space = gym.spaces.Discrete(5)
        self.observation_space = np.zeros((1, 12))

    def reset(self):
        self.agents_done = []
        obs = self.env.reset()
        return obs[0]

    def step(self, action_dict):
        obs, rewards, dones, infos = self.env.step(action_dict)

        d = dict()
        r = dict()
        o = dict()
        i = dict()
        for i_agent in range(len(self.env.agents)):
            if i_agent not in self.agents_done:
                o[i_agent] = obs[i_agent]
                r[i_agent] = rewards[i_agent]
                d[i_agent] = dones[i_agent]
                i[i_agent] = infos[i)agent]
        d['__all__'] = dones['__all__']

        for agent, done in dones.items():
            if done and agent != '__all__':
                self.agents_done.append(agent)

        return  o, r, d, i

我不知道这是什么问题，有什么建议吗？这个错误是什么意思？

Answer 1

This评论确实对我有帮助：

FWIW，我认为，如果政策中出现NaN，则可能会发生此类问题输出。发生这种情况时，您可能会超出范围错误。

通常是由于观察或奖励以某种方式变为NaN，尽管这也可能是政策上的分歧。

在我的情况下，我不得不修改我的观察结果，因为代理人无法学习策略，并且在培训的某个时刻（随机的时间步长）返回的动作为NaN。

RLLib-Tensorflow-InvalidArgumentError：接收到的标签值N超出有效范围[0，N）

1 个答案: