我已经用python模拟了一个游戏。调用游戏时,它将与一个随机玩家,一个决策树玩家和一个RL玩家进行完整循环。当学习者需要做出决定时,游戏将调用run_network()函数,该函数返回一个动作。完整游戏结束时,运行update_reward()。
我的奖励延迟,在奖励之间可以有任何数量的国家和行动。只有肯定的奖励会更新到状态和操作的pos_rewards列表中,并且永远不会删除。
在每局比赛结束时,我都会打印出谁获胜的记录,尽管每局比赛需要一分钟并且会越来越长,但RL球员尚未赢得任何胜利。
from keras.models import model_from_json
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
import Game
def update_reward(pos_rewards):
vec = np.asarray(pos_rewards)
x = vec[:, 3:-1]
y = vec[:, -1]
y_cat = np.zeros((len(y), 7))
for i in range(len(y)):
if y[i] == 0:
y_cat[i] = [1, 0, 0, 0, 0, 0, 0]
elif y[i] == 1:
y_cat[i] = [0, 1, 0, 0, 0, 0, 0]
elif y[i] == 2:
y_cat[i] = [0, 0, 1, 0, 0, 0, 0]
elif y[i] == 3:
y_cat[i] = [0, 0, 0, 1, 0, 0, 0]
elif y[i] == 4:
y_cat[i] = [0, 0, 0, 0, 1, 0, 0]
elif y[i] == 5:
y_cat[i] = [0, 0, 0, 0, 0, 1, 0]
elif y[i] == 6:
y_cat[i] = [0, 0, 0, 0, 0, 0, 1]
model.fit(x, y_cat, batch_size=5000, epochs=1, verbose=0)
model.save_weights(model_file)
with open(arch_file, "w") as json_file:
json_file.write(model.to_json())
def run_network(state):
x = np.array(state[3:])
x = x.reshape(1, 176)
p = model.predict(x)
action = p.argmax(axis=0)
return action
arch_file = 'D:\\model\\rl_arch.json'
model_file = 'D:\\model\\rl_model.h5'
start_new = 0
if start_new == 1:
model = Sequential()
model.add(Dense(units=500, activation='relu', input_dim=176))
...
model.add(Dense(units=25, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(7, activation='softmax'))
else:
with open(f_arch, 'r') as json_file:
model = model_from_json(json_file.read())
model.load_weights(f_model)
model.compile(loss='categorical_crossentropy', optimizer='adam')
Game.tournament(10000)
问题:
当我保存创建的权重然后决定重新运行整个过程时,我的权重会被完全覆盖还是每次运行都会改善模型?
在不完全重写代码的情况下,我可以使用任何统计技术来改善模型的性能吗?