我为二十一点RL运行代码时发生断言错误

时间:2019-01-13 06:18:07

标签: python reinforcement-learning openai-gym

与标题相同。

我的代码是针对二十一点的简单Q学习。 但在学习阶段,循环无法完成。这就是结果。

“ C:\ Program Files \ Anaconda3 \ envs \ untitled4 \ python.exe” C:/Users/USER/PycharmProjects/untitled4/blackjack_RL.py

[]
[-1.0]
[-1.0]
[-1.0, -1]
[-1.0, -1, 1.0]
[-1.0, -1, 1.0]
[-1.0, -1, 1.0, -1]
[-1.0, -1, 1.0, -1, -1]
[-1.0, -1, 1.0, -1, -1]
[-1.0, -1, 1.0, -1, -1]
[-1.0, -1, 1.0, -1, -1, -1]
[-1.0, -1, 1.0, -1, -1, -1]
[-1.0, -1, 1.0, -1, -1, -1, -1.0]
[-1.0, -1, 1.0, -1, -1, -1, -1.0, -1.0]
[-1.0, -1, 1.0, -1, -1, -1, -1.0, -1.0]
[-1.0, -1, 1.0, -1, -1, -1, -1.0, -1.0, 1.0]
[-1.0, -1, 1.0, -1, -1, -1, -1.0, -1.0, 1.0, -1.0]
[-1.0, -1, 1.0, -1, -1, -1, -1.0, -1.0, 1.0, -1.0, -1.0]
[-1.0, -1, 1.0, -1, -1, -1, -1.0, -1.0, 1.0, -1.0, -1.0, 1.0]
[-1.0, -1, 1.0, -1, -1, -1, -1.0, -1.0, 1.0, -1.0, -1.0, 1.0, 1.0]
[-1.0, -1, 1.0, -1, -1, -1, -1.0, -1.0, 1.0, -1.0, -1.0, 1.0, 1.0]
Traceback (most recent call last):
  File "C:/Users/USER/PycharmProjects/untitled4/blackjack_RL.py", line 26, in <module>
    new_state,reward,done,_=env.step(action)
  File "C:\Program Files\Anaconda3\envs\untitled4\lib\site-        packages\gym\envs\toy_text\blackjack.py", line 92, in step
    assert self.action_space.contains(action)
AssertionError

Process finished with exit code 1

打印列表是奖励列表。我认为进展顺利,但突然出现错误并阻止了此操作。我怎样才能解决这个问题?我在anaconda中使用window和pycharm

import gym
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

env=gym.make('Blackjack-v0')

Q=np.zeros([400,env.action_space.n])
num_episodes=10000
dis=0.99
rList=[]

for i in range(num_episodes):
    state = env.reset()
    rALL = 0
    done = False
    e=1./((i/100)+1)
    while not done:
        if np.random.rand(1)<e:
            action=env.action_space.sample()
        else:
            action=np.argmax(Q[state,:])



        new_state,reward,done,_=env.step(action)

        Q[state, action] = reward + dis * np.max(Q[new_state, :])
        print(rList)
        rALL += reward
        state = new_state

    rList.append(rALL)

print('success rate: '+ str(sum(rList)/num_episodes))
print("Final Q-table values")

print(Q)
plt.bar(range(len(rList)),rList,color='blue')
plt.show()

0 个答案:

没有答案