我对openAi环境还很陌生,基本上我正在使用https://github.com/ugo-nama-kun/gym_torcs/tree/master/vtorcs-RL-color尝试不同的强化学习代理。
因此,我写下了自己的Reinforce和GPOMDP代理,首先在其中创建环境
env = TorcsEnv(vision=vision, throttle=False)
,然后将方法调用为env.reset(),env.step()... 一切正常,培训过程开始顺利。
现在,我想在此Gym-torcs env中尝试基准库(https://github.com/openai/baselines),因此我以https://github.com/openai/baselines/blob/master/baselines/trpo_mpi/run_mujoco.py为例,替代了
env = make_mujoco_env(env_id, workerseed)
与
env = TorcsEnv(vision=vision, throttle=False)
Torcs已正确启动,但是当汽车应该开始行驶时,我遇到了以下错误:
Traceback (most recent call last):
File "myAgent.py", line 39, in <module>
main()
File "myAgent.py", line 35, in main
train(args.env, num_timesteps=args.num_timesteps, seed=args.seed)
File "myAgent.py", line 30, in train
max_timesteps=1000, gamma=0.99, lam=0.98, vf_iters=5, vf_stepsize=1e-
3)
File "/usr/src/baselines/baselines/trpo_mpi/trpo_mpi.py", line 199, in
learn
seg = seg_gen.__next__()
File "/usr/src/baselines/baselines/trpo_mpi/trpo_mpi.py", line 36, in
traj_segment_generator
ac, vpred = pi.act(stochastic, ob)
File "/usr/src/baselines/baselines/ppo1/mlp_policy.py", line 54, in
act
ac1, vpred1 = self._act(stochastic, ob[None])
File "/usr/src/baselines/baselines/common/tf_util.py", line 194, in
__call__
results = tf.get_default_session().run(self.outputs_update,
feed_dict=feed_dict)[:-1]
File "/usr/local/lib/python3.5/dist-
packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-
packages/tensorflow/python/client/session.py", line 1104, in _run
np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
File "/home/nicolobrunello/.local/lib/python3.5/site-
packages/numpy/core/numeric.py", line 492, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
有人知道我应该如何将Baseline与Gym-torcs集成在一起吗?
P.S .:我正在使用python 3.5.2和Ubuntu 64位16.04.4