无限播放

Question

我为游戏制作了一个C ++程序chopsticks。

这是一款非常简单的游戏，总共只有625个游戏状态（如果考虑到对称性和无法访问的状态，它会更低）。我已经阅读了minimax和alpha-beta算法，主要是针对tic tac toe，但我遇到的问题是，在tic tac toe它不可能循环回到以前的状态，而在筷子中很容易发生。因此，在运行代码时，最终会出现堆栈溢出。

我通过为以前访问过的状态添加标记来修复此问题（我不知道这是否是正确的方法。）这样可以避免它们，但现在我遇到的问题是输出不符合预期。

例如，在游戏的开始状态中，每个玩家都有一个手指，因此它们都是对称的。该节目告诉我，最好的举动是用左手击中我的右手而不是相反。

我的源代码是 -

#include <iostream>
#include <array>
#include <vector>
#include <limits>
std::array<int, 625> t; //Flags for visited states.
std::array<int, 625> f; //Flags for visited states.
int no = 0; //Unused. For debugging.
class gamestate
{
public:
    gamestate(int x, bool t) : turn(t) //Constructor.
    {
        for (int i = 0; i < 2; i++)
            for (int j = 0; j < 2; j++) {
                val[i][j] = x % 5;
                x /= 5;
            }
        init();
    }
    void print() //Unused. For debugging.
    {
        for (int i = 0; i < 2; i++) {
            for (int j = 0; j < 2; j++)
                std::cout << val[i][j] << "\t";
            std::cout << "\n";
        }
        std::cout << "\n";
    }
    std::array<int, 6> canmove = {{ 1, 1, 1, 1, 1, 1 }}; //List of available moves.
    bool isover() //Is the game over.
    {
        return ended;
    }
    bool won() //Who won the game.
    {
        return winner;
    }
    bool isturn() //Whose turn it is.
    {
        return turn;
    }
    std::vector<int> choosemoves() //Choose the best possible moves in the current state.
    {
        std::vector<int> bestmoves;
        if(ended)
            return bestmoves;
        std::array<int, 6> scores;
        int bestscore;
        if(turn)
            bestscore = std::numeric_limits<int>::min();
        else
            bestscore = std::numeric_limits<int>::max();
        scores.fill(bestscore);
        for (int i = 0; i < 6; i++)
            if (canmove[i]) {
                t.fill(0);
                f.fill(0);
                gamestate *play = new gamestate(this->playmove(i),!turn);
                scores[i] = minimax(play, 0, std::numeric_limits<int>::min(), std::numeric_limits<int>::max());
                std::cout<<i<<": "<<scores[i]<<std::endl;
                delete play;
                if (turn) if (scores[i] > bestscore) bestscore = scores[i];
                if (!turn) if (scores[i] < bestscore) bestscore = scores[i];
            }
        for (int i = 0; i < 6; i++)
            if (scores[i] == bestscore)
                bestmoves.push_back(i);
        return bestmoves;
    }
private:
    std::array<std::array<int, 2>, 2 > val; //The values of the fingers.
    bool turn; //Whose turn it is.
    bool ended = false; //Has the game ended.
    bool winner; //Who won the game.
    void init() //Check if the game has ended and find the available moves.
    {
        if (!(val[turn][0]) && !(val[turn][1])) {
            ended = true;
            winner = !turn;
            canmove.fill(0);
            return;
        }
        if (!(val[!turn][0]) && !(val[!turn][1])) {
            ended = true;
            winner = turn;
            canmove.fill(0);
            return;
        }
        if (!val[turn][0]) {
            canmove[0] = 0;
            canmove[1] = 0;
            canmove[2] = 0;
            if (val[turn][1] % 2)
                canmove[5] = 0;
        }
        if (!val[turn][1]) {
            if (val[turn][0] % 2)
                canmove[2] = 0;
            canmove[3] = 0;
            canmove[4] = 0;
            canmove[5] = 0;
        }
        if (!val[!turn][0]) {
            canmove[0] = 0;
            canmove[3] = 0;
        }
        if (!val[!turn][1]) {
            canmove[1] = 0;
            canmove[4] = 0;
        }
    }
    int playmove(int mov) //Play a move to get the next game state.
    {
        auto newval = val;
        switch (mov) {
        case 0:
            newval[!turn][0] = (newval[turn][0] + newval[!turn][0]);
            newval[!turn][0] = (5 > newval[!turn][0]) ? newval[!turn][0] : 0;
            break;
        case 1:
            newval[!turn][1] = (newval[turn][0] + newval[!turn][1]);
            newval[!turn][1] = (5 > newval[!turn][1]) ? newval[!turn][1] : 0;
            break;
        case 2:
            if (newval[turn][1]) {
                newval[turn][1] = (newval[turn][0] + newval[turn][1]);
                newval[turn][1] = (5 > newval[turn][1]) ? newval[turn][1] : 0;
            } else {
                newval[turn][0] /= 2;
                newval[turn][1] = newval[turn][0];
            }
            break;
        case 3:
            newval[!turn][0] = (newval[turn][1] + newval[!turn][0]);
            newval[!turn][0] = (5 > newval[!turn][0]) ? newval[!turn][0] : 0;
            break;
        case 4:
            newval[!turn][1] = (newval[turn][1] + newval[!turn][1]);
            newval[!turn][1] = (5 > newval[!turn][1]) ? newval[!turn][1] : 0;
            break;
        case 5:
            if (newval[turn][0]) {
                newval[turn][0] = (newval[turn][1] + newval[turn][0]);
                newval[turn][0] = (5 > newval[turn][0]) ? newval[turn][0] : 0;
            } else {
                newval[turn][1] /= 2;
                newval[turn][0] = newval[turn][1];
            }
            break;
        default:
            std::cout << "\nInvalid move!\n";
        }
        int ret = 0;
        for (int i = 1; i > -1; i--)
            for (int j = 1; j > -1; j--) {
                ret+=newval[i][j];
                ret*=5;
            }
        ret/=5;
        return ret;
    }
    static int minimax(gamestate *game, int depth, int alpha, int beta) //Minimax searching function with alpha beta pruning.
    {
        if (game->isover()) {
            if (game->won())
                return 1000 - depth;
            else
                return depth - 1000;
        }
        if (game->isturn()) {
            for (int i = 0; i < 6; i++)
                if (game->canmove[i]&&t[game->playmove(i)]!=-1) {
                    int score;
                    if(!t[game->playmove(i)]){
                        t[game->playmove(i)] = -1;
                        gamestate *play = new gamestate(game->playmove(i),!game->isturn());
                        score = minimax(play, depth + 1, alpha, beta);
                        delete play;
                        t[game->playmove(i)] = score;
                    }
                    else
                        score = t[game->playmove(i)];
                    if (score > alpha) alpha = score;
                    if (alpha >= beta) break;
                }
            return alpha;
        } else {
            for (int i = 0; i < 6; i++)
                if (game->canmove[i]&&f[game->playmove(i)]!=-1) {
                    int score;
                    if(!f[game->playmove(i)]){
                        f[game->playmove(i)] = -1;
                        gamestate *play = new gamestate(game->playmove(i),!game->isturn());
                        score = minimax(play, depth + 1, alpha, beta);
                        delete play;
                        f[game->playmove(i)] = score;
                    }
                    else
                        score = f[game->playmove(i)];
                    if (score < beta) beta = score;
                    if (alpha >= beta) break;
                }
            return beta;
        }
    }
};
int main(void)
{
    gamestate test(243, true);
    auto movelist = test.choosemoves();
    for(auto i: movelist)
        std::cout<<i<<std::endl;
    return 0;
}

我将这些动作传递给基数为5的十进制系统，因为每只手的值可以是0到4。

在代码中我输入了状态 -

3    3

4    1

输出说我应该右手（1）击中对手的右边（3），但是它没有说我应该将它击中对手的左边（也是3）

我认为问题是因为我处理无限循环的方式。

这样做的正确方法是什么？或者，如果这是正确的方法，那么我该如何解决问题？

另外，请告诉我如何改进我的代码。

非常感谢。

修改

我已经改变了我的minimax函数，如下所示，以确保无限循环得分高于失败，但我还没有得到对称性。我还做了一个功能，为分数增加深度

static float minimax(gamestate *game, int depth, float alpha, float beta) //Minimax searching function with alpha beta pruning.
    {
        if (game->isover()) {
            if (game->won())
                return 1000 - std::atan(depth) * 2000 / std::acos(-1);
            else
                return std::atan(depth) * 2000 / std::acos(-1) - 1000;
        }
        if (game->isturn()) {
            for (int i = 0; i < 6; i++)
                if (game->canmove[i]) {
                    float score;
                    if(!t[game->playmove(i)]) {
                        t[game->playmove(i)] = -1001;
                        gamestate *play = new gamestate(game->playmove(i), !game->isturn());
                        score = minimax(play, depth + 1, alpha, beta);
                        delete play;
                        t[game->playmove(i)] = score;
                    } else if(t[game->playmove(i)] == -1001)
                        score = 0;
                    else
                        score = adddepth(t[game->playmove(i)], depth);
                    if (score > alpha) alpha = score;
                    if (alpha >= beta) break;
                }
            return alpha;
        } else {
            for (int i = 0; i < 6; i++)
                if (game->canmove[i]) {
                    float score;
                    if(!f[game->playmove(i)]) {
                        f[game->playmove(i)] = -1001;
                        gamestate *play = new gamestate(game->playmove(i), !game->isturn());
                        score = minimax(play, depth + 1, alpha, beta);
                        delete play;
                        f[game->playmove(i)] = score;
                    } else if(f[game->playmove(i)] == -1001)
                        score = 0;
                    else
                        score = adddepth(f[game->playmove(i)], depth);
                    if (score < beta) beta = score;
                    if (alpha >= beta) break;
                }
            return beta;
        }
    }

这是添加深度的功能 -

float adddepth(float score, int depth) //Add depth to pre-calculated score.
{
    int olddepth;
    float newscore;
    if(score > 0) {
        olddepth = std::tan((1000 - score) * std::acos(-1) / 2000);
        depth += olddepth;
        newscore = 1000 - std::atan(depth) * 2000 / std::acos(-1);
    } else {
        olddepth = std::tan((1000 + score) * std::acos(-1) / 2000);
        depth += olddepth;
        newscore = std::atan(depth) * 2000 / std::acos(-1) - 1000;
    }
    return newscore;
}

Answer 1

免责声明：我不了解C ++，~~，坦白地说，我很难阅读游戏规则~~。 我现在已经阅读了规则，并且仍然支持我所说的...但我仍然不了解C ++。但是，我仍然可以提供一些应该为您设置的算法的一般知识朝着正确的方向前进。

不对称本身并不是一件坏事。如果两个动作完全相同，那么应该选择其中一个而不像Buridan's ass那样无助。事实上，你应该确定你所编写的任何代理人都有一些在政策之间任意选择的方法，这是无法区分的。

您应该更仔细地考虑拒绝访问以前的州所隐含的效用计划。追求无限循环是一种有效的策略，即使您当前的表示会使程序崩溃;也许错误是溢出，而不是导致它的政策。如果在失去游戏和拒绝让游戏结束之间做出选择，你希望你的代理人更喜欢哪一个？

无限播放

如果你希望你的经纪人不惜一切代价避免损失 - 也就是说，你希望它更喜欢无限期的失败 - 那么我建议将任何重复的状态视为终止状态，并在获胜之间的某个地方分配一个值并输了。毕竟，从某种意义上说，它是终端 - 这是游戏将永远进入的循环，并且它的明确结果是没有胜利者。但是，请记住，如果你使用简单的极小极大（一个效用函数，而不是两个），那么这意味着你的对手也认为永恒的游戏是一个中等的结果。

这可能听起来很荒谬，但也许玩到无穷无尽是一个合理的政策。请记住，极小极大假设是最坏的情况 - 一个完全理性的敌人，其利益与你的完全相反。但是，例如，如果你正在编写一个代理人来对抗一个人，那么人类将要么逻辑错误，要么最终决定他们宁愿以失败告终游戏 - 所以你的代理人将从耐心的入住中受益在这个纳什均衡循环中！

好吧，让我们结束游戏

如果你希望你的经纪人更喜欢游戏最终结束，那么我会建议实施一个生命惩罚 - 一个添加到你的实用程序的修饰符随着时间的推移而减少（无论是渐近还是无限制）。仔细实施，这可以保证，最终，任何目标都优于另一个转弯。通过这个解决方案，你需要注意考虑这对你的对手意味着什么偏好。

第三种方式

另一个常见的解决方案是深度限制搜索并实施评估功能。这会将游戏状态作为输入，只是吐出一个实用值，这是对最终结果的最佳猜测。这可证明是最佳的吗？不，除非您的评估功能只是完成最小极大，否则意味着您的算法将在合理的时间内完成。通过将这个粗略的估计深埋在树中，你可以得到一个非常合理的模型。但是，这会产生一个不完整的策略，这意味着它对重新规划代理比对标准规划代理更有用。 Minimax重新计划是复杂游戏的常用方法（如果我没有弄错，基本算法后跟Deep Blue），但由于这是一个非常简单的游戏，你可能不需要采取这种方法。

关于抽象的附注

请注意，所有这些解决方案都被概念化为对效用函数的数值更改或估计。一般来说，这比任意抛弃可能的政策更可取。毕竟，这就是你的实用功能的用途 - 每当你根据除实用程序的数值之外的任何事情做出决策时，你就会破坏你的抽象并使你的代码不那么健壮。

具有alpha-beta修剪问题的Minimax

1 个答案:

无限播放

好吧，让我们结束游戏

第三种方式

关于抽象的附注