Question

我正在使用TensorFlow learn（以前的Scikit Flow）试验人工神经网络作为井字游戏的玩家

模拟器代码为here。我当前简单的基于nn的玩家与随机玩家相比得分约为90％，但根本无法防守，因为我不知道如何向NN提供它应该不做什么（错误的动作）如果对手连续两次）。此外，它只训练最后一个董事会状态和获胜行动。

问题：

解决这个问题的更好方法是什么（使用人工神经网络和零游戏规则知识）？
我假设RNN / LSTM NN会有所帮助。 TensorFlow Learn网站上有一个简短的例子：

classifier = skflow.TensorFlowRNNClassifier(rnn_size=EMBEDDING_SIZE, n_classes=15, cell_type='gru', input_op_fn=input_op_fn, num_layers=1, bidirectional=False, sequence_length=None, steps=1000, optimizer='Adam', learning_rate=0.01, continue_training=True)

但我在这里不知所措 - 这对于tic-tac-toe的例子有什么用呢？ rnn_size=EMBEDDING_SIZE和input_op_fn params会是什么？

编辑：培训代码如下所示（Player source）：

def train(self, history):

    X = np.zeros((len(history), 18))
    y = np.zeros((len(history), 1))
    i = 0

    for game in history:

        # Train only on wins of player A (+1 values)
        if game.state == game.WIN_P1:

            # Get the second last board states (that lead to the winning move)
            X[i] = np.concatenate([
                game.history[-2],
                game.history[-3]
            ]).flatten()

            y[i] = game.last_move

        # TODO: How can we train lost games with a DNN classifier?
        # if game.state == game.WIN_PMINUS1:

        i += 1

    self.classifier.fit(X, y)

因此X中的一行包含最后两个游戏状态（3x3 nparrays分别包含0,1，-1表示空，玩家1，玩家2），y包含最后一个动作（导致胜利）。 / p>

使用TensorFlow的Tic-tac-toe ANN玩家学习？

0 个答案: