可以在近端策略优化中包含历史记录吗?

时间:2017-11-14 13:26:35

标签: machine-learning state reinforcement-learning

例如,状态可以实际上由t和t-1处的状态构成。

S_t = [s_t, s_t-1]

即。近端策略优化是否已包含状态历史记录,或者它是否隐含在状态中(或两者都没有)。

1 个答案:

答案 0 :(得分:1)

You could concatenate your observations. This is very common to do it RL. Usually in atari domain the last four frames are joined into a single observation. This makes it possible for the agent to understand change in the environment.

a basic PPO algorithm does not by default implicitly keep track of state history. You could make this possible though by adding a recurrent layer.