REINFORCE实施连续行动空间(humanoid-v2)?

时间:2018-04-12 19:29:53

标签: python machine-learning deep-learning reinforcement-learning openai-gym

我已经看到REINFORCE又名Vanilla策略算法的多个实现被用于具有离散动作空间的强化学习任务。连续动作空间是否有任何算法(或其他策略梯度算法)的实现?

更具体地说,是否可以实施REINFORCE用于双足运动 - 来自OpenAI Gym的“humanoid-v2”?

谢谢。

1 个答案:

答案 0 :(得分:0)

您可以稳定基准软件包:https://github.com/hill-a/stable-baselines

培训代理商很简单:

import gym
from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines import PPO2

my_env_id = 'Humanoid-v2'

env = gym.make(my_env_id)
# Vectorized environments allow to easily multiprocess training
# we demonstrate its usefulness in the next examples
env = DummyVecEnv([lambda: env])  # The algorithms require a vectorized environment to run

model = PPO2(MlpPolicy, env, verbose=1)
# Train the agent
model.learn(total_timesteps=10000)

# Enjoy trained agent
obs = env.reset()
for i in range(1000):
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()