我有一个家庭作业,该作业使用MCTS(http://mcts.ai/code/python.html)来播放使用MCTS所需的井字游戏。任务的目标是训练决策树分类器,该分类器可以根据游戏的当前状态和玩游戏的玩家来预测采取最佳措施。数据标记为1.0或2.0或0,具体取决于哪个玩家在井字游戏网格中标记了他选择的位置(0表示没有玩家)。到目前为止,Ive设法将数据保存为CSV格式,如下所示:
未命名:0名球员0 1 2 ... 6 7 8 best_move获胜
0 0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 4 0
我的第一个也是主要的问题是如何使用scikit-learn合并所有相等的状态来做出决策树分类器,即,根应该为第一位玩家提供9个决策,然后为第二位玩家提供8个决策,依此类推玩家(玩家1为1.0,玩家2为2.0)。第二个相互关联的问题是,我如何才能一次又一次地以0-8(9)间隔表示重复数据,以便在读取完第9个间隔后,它将从下一个游戏的起点重新开始。当然,最好将玩家1或玩家2相同的子状态分组在一起。
here是我的代码生成的树的pdf视图。下面是我用来训练决策树的代码。
def visualise_tree(trained_tree):
dot_data = tree.export_graphviz(trained_tree,out_file=None)
graph = graphviz.Source(dot_data)
graph.render("oxo")
def trainTree(read_csv):
clf = tree.DecisionTreeClassifier()
slice_training_data = read_csv[["player","0", "1", "2", "3", "4", "5", "6", "7", "8"]]
slice_prediction_data = read_csv[["best_move"]]
clf.fit(slice_training_data,slice_prediction_data)
visualise_tree(clf)
print(read_csv)
if __name__ == "__main__":
""" Play a single game to the end using UCT for both players.
"""
#df = pd.DataFrame(columns=["player", "0", "1", "2", "3", "4", "5", "6", "7", "8", "best_move","won"])
#for i in range(1):
# df = UCTPlayGame(df)
read_csv = pd.read_csv('10000games.csv')
trainTree(read_csv)
#df = df[["player", "0", "1", "2", "3", "4", "5", "6", "7", "8", "best_move","won"]]
#print(df)
#df.to_csv('10000games.csv')
以下是数据格式:
,player,0,1,2,3,4,5,6,7,8,best_move,won
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4,0
1,2.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0,0
2,1.0,2.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1,0
3,2.0,2.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,7,0
4,1.0,2.0,1.0,0.0,0.0,1.0,0.0,0.0,2.0,0.0,3,0
5,2.0,2.0,1.0,0.0,1.0,1.0,0.0,0.0,2.0,0.0,5,0
6,1.0,2.0,1.0,0.0,1.0,1.0,2.0,0.0,2.0,0.0,2,0
7,2.0,2.0,1.0,1.0,1.0,1.0,2.0,0.0,2.0,0.0,6,0
8,1.0,2.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,0.0,8,0
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
您可以看到进行了9次移动,然后数据集为一个新游戏(从0开始)重复。每个玩家轮流移动数据时,每个玩家的数据在1.0到2.0之间循环。除要求外,我还为获胜的一组动作添加了获胜栏(但不确定如何使用它,因此我没有将其包括在预测数据中)。理想情况下,决策树应合并所有开始的游戏状态,并预测最佳动作。