例如,状态可以实际上由t和t-1处的状态构成。
S_t = [s_t, s_t-1]
即。近端策略优化是否已包含状态历史记录,或者它是否隐含在状态中(或两者都没有)。
答案 0 :(得分:1)
You could concatenate your observations. This is very common to do it RL. Usually in atari domain the last four frames are joined into a single observation. This makes it possible for the agent to understand change in the environment.
a basic PPO algorithm does not by default implicitly keep track of state history. You could make this possible though by adding a recurrent layer.