我对机器学习还很陌生,并且一直在通过稳定的Python基准研究PPO2,以进行强化学习。
您需要知道的代码是:
env = gym.make('fishing-v0')
model = PPO2(MlpPolicy, env , verbose=2)
obs = env.reset()
for i in range(100):
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
env.render()
env.close()
我将如何绘制时间步长作为x轴,奖励作为y轴
我尝试过:
def step():
obs = env.reset()
for i in range(100):
action, _states = loaded_model.predict(obs)
obs, rewards, dones, info = env.step(action)
while i <= 100:
x = []
x.append(i)
i += 1
return x
env.render()
step()
y = rewards
plt.figure(figsize=(9, 4))
plt.plot(x, y, linewidth=2.0)
plt.show()
但是我在这里只有一个空白的x和y轴:
谢谢