Question

例如，状态可以实际上由t和t-1处的状态构成。

S_t = [s_t, s_t-1]

即。近端策略优化是否已包含状态历史记录，或者它是否隐含在状态中（或两者都没有）。

Answer 1

You could concatenate your observations. This is very common to do it RL. Usually in atari domain the last four frames are joined into a single observation. This makes it possible for the agent to understand change in the environment.

a basic PPO algorithm does not by default implicitly keep track of state history. You could make this possible though by adding a recurrent layer.

可以在近端策略优化中包含历史记录吗？

1 个答案: