我想让MountainCar的视频多次到达目标(旗帜位置> .5)。我使用的是openai的MountainCar-v0(步数和奖励限制的变化)但是需要很多时间才能达到目标。 我使用以下代码:
import numpy as np
import gym
from gym import wrappers
gym.envs.register(
id='MountainCarMyVersion-v0',
entry_point='gym.envs.classic_control:MountainCarEnv',
max_episode_steps=200000, # MountainCar-v0 uses 200
reward_threshold=-1000.0,
)
env = gym.make('MountainCarMyVersion-v0')
env = wrappers.Monitor(env, '/home/video', force=True)
game_terminator = 0
for i_episode in range(2000):
time.sleep(2)
for t in range(1000000):
if game_terminator:
break
env.render()
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
print("Episode finished after {} timesteps".format(t + 1))
observation = env.reset()
if t < 200000 - 1 and reward > -1:
print('the flag point is reched at step:', t)
game_terminator = 1
break
如何更改此设置以帮助代理更快地达到目标? 谢谢