深度Q Pong学习失败

时间:2018-11-16 04:35:59

标签: python keras deep-learning artificial-intelligence reinforcement-learning

我最近开始研究Deep Q学习,并遇到了问题。跟随网上的文章(包括https://keon.io/deep-q-learning/),我试图修改购物车示例以与Pong游戏配合使用。但是,即使我的解决方案似乎遵循了其他方法(最小化状态,保持差异等),人工智能也从没有变好(一次超过800场比赛)。

基本代码

import numpy as np
from ddqn import DQNAgent
import gym
from PIL import Image
env = gym.make('Pong-v0')
state_size = 88*80
action_size = env.action_space.n
agent = DQNAgent(state_size, action_size)
done = False
batch_size = 32
for e in range(EPISODES):
    state = env.reset()
    prev_state = state
    total_reward = 0
    for time in range(500000):
        env.render()
        current_state_diff = minimize_state(state) - minimize_state(prev_state)
        action = agent.act(current_state_diff)
        next_state, reward, done, _ = env.step(action)
        total_reward += reward
        next_state_diff = minimize_state(next_state) -         minimize_state(state)
        agent.remember(current_state_diff, action, reward, next_state_diff, done)
        state = next_state
        if done:
            agent.update_target_model()
            print("episode: {}/{}, score: {}, e: {:.2}"
                  .format(e, EPISODES, total_reward, agent.epsilon))
            break
        if len(agent.memory) > batch_size:
            agent.replay(batch_size)
    if e % 10 == 0:
        agent.save("./save/pong-v0-ddqn.h5")

最小化功能在哪里

def minimize_state(state):
    img = Image.fromarray(state)
    state = np.array(img.convert("L"))
    state = state[34:]
    state = state[::2,::2]
    return np.reshape(state, [1, state_size])

DQNAgent来自:https://github.com/keon/deep-q-learning

怎么了?

0 个答案:

没有答案