强化学习:绘制时效与奖励

时间:2020-07-31 18:23:18

标签: python machine-learning reinforcement-learning

我对机器学习还很陌生,并且一直在通过稳定的Python基准研究PPO2,以进行强化学习。

您需要知道的代码是:


env = gym.make('fishing-v0')
model = PPO2(MlpPolicy, env , verbose=2)

obs = env.reset()
for i in range(100):
  action, _states = model.predict(obs)
  obs, rewards, dones, info = env.step(action)
  env.render()
  
env.close()

我将如何绘制时间步长作为x轴,奖励作为y轴

我尝试过:


def step():
  obs = env.reset()
  for i in range(100):
    action, _states = loaded_model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    while i <= 100:
      x = []
      x.append(i)
      i += 1
      return x
    env.render()

step()

y = rewards
plt.figure(figsize=(9, 4))
plt.plot(x, y, linewidth=2.0)
plt.show()

但是我在这里只有一个空白的x和y轴:

plot image

谢谢

0 个答案:

没有答案