预期类型为“ int”,而取值为“ ndarray [int]”

时间:2019-02-04 09:29:28

标签: python python-3.x reinforcement-learning openai-gym

我正在使用Open AI的体育馆环境以及Christian Kauten的马里奥代码来教它如何玩游戏。但是我在下面显示的第二个代码块中遇到以下错误,其中操作给出了以下错误:Expected type 'int', got 'ndarray[int]' instead

运行代码时,我还会得到:“ Python[19457:566482] ApplePersistenceIgnoreState: Existing state will not be touched. New state will be written to (null)"。我该如何解决?

Q学习算法如下:

action_space_size = len(SIMPLE_MOVEMENT)
state_space_size = 10000

q_table = np.zeros((state_space_size, action_space_size))

max_episodes = 1
max_steps_per_episode = 10

learning_rate = 0.1
discount_rate = 0.99
exploration_rate = 1
max_exploration_rate = 1
min_exploration_rate = 0.01
explore_decay_rate = 0.01

reward_all_episodes = []

curr_available_action = 0
curr_available_state = 0
for episode in range(max_episodes):
state = env.reset()
#dict_state[state] = curr_available_state
curr_available_state += 1

done = False
rewards_curr_episode = 0


for step in range(max_steps_per_episode):

    exploration_rate_threshold = random.uniform(0, 1)

    if exploration_rate_threshold > exploration_rate:
        action = np.argmax(q_table[state, :])

        print(action)

    else:
        action = env.action_space.sample()

    new_state, reward, done, info = env.step(action)

    q_table[state, action] = q_table[state, action] * (1 - learning_rate) + learning_rate * (
                reward + discount_rate * np.max(q_table[new_state, :]))

    state = new_state

    rewards_curr_episode += reward

    if done == True:
        break

    exploration_rate = min_exploration_rate + (max_exploration_rate - min_exploration_rate) * np.exp(
        - explore_decay_rate * episode)

    reward_all_episodes.append(rewards_curr_episode)

玩马里奥的代码如下:

    for episode in range(1):
state = env.reset()
done = False
print("***Episode", episode + 1, "***\n\n\n\n\n")
time.sleep(1)

rewards_curr_episode = 0
for step in range(max_steps_per_episode):
    # clear_output(wait = True)
    env.render()
    time.sleep(0.3)

    action = np.argmax(q_table[state, :])
    new_state, reward, done, info = env.step(action)

    rewards_curr_episode += reward

    if done:
        # clear_output(wait=True)
        env.render()
        print("Total Reward is: " + reward)
        time.sleep(3)

        break

    state = new_state

env.close()

动作定义如下:

    SIMPLE_MOVEMENT = [
['NOOP'],
['right'],
['right', 'A'],
['right', 'B'],
['right', 'A', 'B'],
['A'],
['left'],
]

0 个答案:

没有答案