我最近开始研究Deep Q学习,并遇到了问题。跟随网上的文章(包括https://keon.io/deep-q-learning/),我试图修改购物车示例以与Pong游戏配合使用。但是,即使我的解决方案似乎遵循了其他方法(最小化状态,保持差异等),人工智能也从没有变好(一次超过800场比赛)。
基本代码
import numpy as np
from ddqn import DQNAgent
import gym
from PIL import Image
env = gym.make('Pong-v0')
state_size = 88*80
action_size = env.action_space.n
agent = DQNAgent(state_size, action_size)
done = False
batch_size = 32
for e in range(EPISODES):
state = env.reset()
prev_state = state
total_reward = 0
for time in range(500000):
env.render()
current_state_diff = minimize_state(state) - minimize_state(prev_state)
action = agent.act(current_state_diff)
next_state, reward, done, _ = env.step(action)
total_reward += reward
next_state_diff = minimize_state(next_state) - minimize_state(state)
agent.remember(current_state_diff, action, reward, next_state_diff, done)
state = next_state
if done:
agent.update_target_model()
print("episode: {}/{}, score: {}, e: {:.2}"
.format(e, EPISODES, total_reward, agent.epsilon))
break
if len(agent.memory) > batch_size:
agent.replay(batch_size)
if e % 10 == 0:
agent.save("./save/pong-v0-ddqn.h5")
最小化功能在哪里
def minimize_state(state):
img = Image.fromarray(state)
state = np.array(img.convert("L"))
state = state[34:]
state = state[::2,::2]
return np.reshape(state, [1, state_size])
DQNAgent来自:https://github.com/keon/deep-q-learning
怎么了?