我正在暂时重用pacman的代码库来训练自己的深度强化学习模型。尽管大多数组件对我来说似乎是合理且可以理解的,但有两点对我来说却很模糊:
如何确定重播内存的大小?当前,由于我将学习的总步长设置为4000(请注意,在所引用的代码库中,此值设置为4000000),所以我只是按比例地将replay_memory_size
减小为400。这有意义吗?
调用函数epsilon
时返回的值PiecewiseSchedule
是什么?我还按比例减小其参数,如下所示:
epsilon = PiecewiseSchedule([(0, 1.0),
(40, 1.0), # since we start training at 10000 steps
(80, 0.4),
(200, 0.2),
(400, 0.1),
(2000, 0.05)], outside_value=0.01)
replay_memory = PrioritizedReplayBuffer(replay_memory_size, replay_alpha)
原始函数调用如下:
epsilon = PiecewiseSchedule([(0, 1.0),
(10000, 1.0), # since we start training at 10000 steps
(20000, 0.4),
(50000, 0.2),
(100000, 0.1),
(500000, 0.05)], outside_value=0.01)
replay_memory = PrioritizedReplayBuffer(replay_memory_size, replay_alpha)
通常,设置适当大小的“重播内存”并调用函数PiecewiseSchedule
的原理(准则)是什么?谢谢!