在跟随学习喀拉拉邦的教程之后,我遇到了一些麻烦。 我有一些代码可以解决月球着陆器的问题,这似乎可以训练代理并经过多次反复训练后获得不错的成绩(例如,对于月球着陆器,他的得分通常在200-400之间),但是当我加载时训练有素的模型就像他从头开始一样。不知道我是在做错事情还是keras是在这里,真的需要一些建议。
在底部,我包括了一些最终得分,而当我重新运行时,一些得分得到了
import random
import os
import gym
import numpy as np
from collections import deque
from keras.models import Sequential
from keras.models import load_model
from keras.layers import Dense
from keras.optimizers import Adam
EPISODES = 1000
class DQNAgent:
def __init__(self, state_size, action_size):
self.state_size = state_size
self.action_size = action_size
self.memory = deque(maxlen=2000)
self.gamma = 0.95 # discount rate
self.epsilon = 1.0 # exploration rate
self.epsilon_min = 0.01
self.epsilon_decay = 0.995
self.learning_rate = 0.001
self.model = self._build_model()
def _build_model(self):
# Neural Net for Deep-Q learning Model
model = Sequential()
model.add(Dense(24, input_dim=self.state_size, activation='relu'))
model.add(Dense(24, activation='relu'))
model.add(Dense(self.action_size, activation='linear'))
model.compile(loss='mse',
optimizer=Adam(lr=self.learning_rate))
return model
def remember(self, state, action, reward, next_state, done):
self.memory.append((state, action, reward, next_state, done))
def act(self, state):
if np.random.rand() <= self.epsilon:
return random.randrange(self.action_size)
act_values = self.model.predict(state)
return np.argmax(act_values[0]) # returns action
def replay(self, batch_size):
minibatch = random.sample(self.memory, batch_size)
for state, action, reward, next_state, done in minibatch:
target = reward
if not done:
target = (reward + self.gamma *
np.amax(self.model.predict(next_state)[0]))
target_f = self.model.predict(state)
target_f[0][action] = target
self.model.fit(state, target_f, epochs=1, verbose=0)
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay
def load(self, name):
#self.model.load(name)
pass
def save(self, name):
self.model.save(name)
#if __name__ == "__main__":
loaded = False
env = gym.make('LunarLander-v2')
state_size = env.observation_space.shape[0]
action_size = env.action_space.n
agent = DQNAgent(state_size, action_size)
# agent.load("./save/cartpole-dqn.h5")
if os.path.exists('cart_save.h5') == True:
#agent.model.load('cart_save.h5')
agent.model = load_model('cart_save.h5')
loaded = False
done = False
batch_size = 32
print
print loaded
print
for e in range(EPISODES):
state = env.reset()
state = np.reshape(state, [1, state_size])
time_rec = -1
total_reward = 0
end = False
#for time in range(500):
while not end:
time_rec += 1
# env.render()
action = agent.act(state)
next_state, reward, done, _ = env.step(action)
#reward = reward if not done else -10
total_reward += reward
next_state = np.reshape(next_state, [1, state_size])
agent.remember(state, action, total_reward, next_state, done)
state = next_state
if loaded:
env.render()
if done:
print("episode: {}/{}, score: {}, e: {:.2}"
.format(e, EPISODES, total_reward, agent.epsilon))
break
if len(agent.memory) > batch_size:
agent.replay(batch_size)
if done and e > 50 and agent.epsilon < .05 and total_reward > 200:
agent.model.save("cart2_save.h5")
print
print "saved file"
print
if e == 200:
agent.model.save("cart_final.h5")
if e == 400:
agent.model.save("cart_final.h5")
if e == 600:
agent.model.save("cart_final.h5")
if e >= 1000:
agent.model.save("cart_final.h5")
答案 0 :(得分:1)
这个问题很老了,但也许它可以帮助其他人。我遇到了同样的问题,实际上意识到它与加载模型无关。
问题是这一行:
def __init__(self, state_size, action_size):
...
self.epsilon = 1.0 # exploration rate
当您训练时,每集都会减少您的 epsilon,因此您的探索次数会减少。 但是,如果您实例化一个新代理并加载模型,您将拥有正确的模型,但有一个完整的探索代理 (epsilon=1)。 从训练好的模型加载时,您需要 epsilon=0,因为您不再进行任何训练(或者在忽略 epsilon 的训练之外使用不同的推理函数进行预测)。
另一种看待它的方式是意识到仅用于预测(无训练),您不需要代理。您可以单独使用模型。
model = load_model("model_name.h5")
action = model.predict(state)
答案 1 :(得分:0)
我敢打赌,您没有加载您认为正在加载的内容。也许我错过了,但是您发布的代码似乎并不包括将训练有素的模型写到磁盘上,因此我不确定要加载的模型是如何到达那里的。此外,您检查是否存在TreeMap<Integer, Double> ratings = establishments.stream()
.collect(Collectors.groupingBy(Establishment::getRating, TreeMap::new,
Collectors.summingDouble(v -> ((1.0 / establishments.size())* 100))));
,但随后忽略该模型并(尝试)加载long count = establishments.stream()
.filter(e -> e.getRating() == 3)
.count();
。根据您的输出,似乎900.h5可能不是您要尝试加载的模型。
阅读代码有点困难,因为缩进似乎有些混乱。另外,您是否使用python 2?您应该真的切换到python3。
我的建议是将所有已保存的模型移动到脚本不知道的子目录中,例如“ ./archive”或类似内容,然后尝试重新运行它。这样一来,您可以确定正在加载的模型是脚本的最新执行所生成的已保存检查点,因为我目前尚不清楚这种情况。
答案 2 :(得分:0)
您可能想尝试:
from tensorflow.keras.models import load_model
#Save the whole model
model.save('my_model.h5')
new_model = load_model('my_model.h5')