Question

我正在为我正在制作的棋盘游戏创建一个非常幼稚的AI（它甚至不应该被称为AI，因为它只是测试了很多可能性并为他挑选了最好的一种）。这是为了简化平衡游戏所需进行的手动测试。

AI独自一人玩，做以下事情：在每一回合中，AI与一位英雄一起玩耍，攻击战场上最多9个怪物之一。他的目标是尽可能快地完成战斗（以最少的回合次数），并以最少的怪物激活次数完成战斗。

为实现这一目标，我为AI实现了一种超前思考的算法，该算法不会立即执行最佳动作，而是根据其他英雄未来动作的可能结果来选择动作。这是他执行此操作的代码段，它是用PHP编写的：

/** Perform think ahead moves
 *
 * @params int         $thinkAheadLeft      (the number of think ahead moves left)
 * @params int         $innerIterator       (the iterator for the move)
 * @params array       $performedMoves      (the moves performed so far)
 * @param  Battlefield $originalBattlefield (the previous state of the Battlefield)
 */
public function performThinkAheadMoves($thinkAheadLeft, $innerIterator, $performedMoves, $originalBattlefield, $tabs) {
    if ($thinkAheadLeft == 0) return $this->quantify($originalBattlefield);

    $nextThinkAhead = $thinkAheadLeft-1;
    $moves = $this->getPossibleHeroMoves($innerIterator, $performedMoves);
    $Hero = $this->getHero($innerIterator);
    $innerIterator++;
    $nextInnerIterator = $innerIterator;
    foreach ($moves as $moveid => $move) {
        $performedUpFar = $performedMoves;
        $performedUpFar[] = $move;
        $attack = $Hero->getAttack($move['attackid']);
        $monsters = array();
        foreach ($move['targets'] as $monsterid) $monsters[] = $originalBattlefield->getMonster($monsterid)->getName();
        if (self::$debug) echo $tabs . "Testing sub move of " . $Hero->Name. ": $moveid of " . count($moves) . "  (Think Ahead: $thinkAheadLeft | InnerIterator: $innerIterator)\n";

        $moves[$moveid]['battlefield']['after']->performMove($move);

        if (!$moves[$moveid]['battlefield']['after']->isBattleFinished()) {
            if ($innerIterator == count($this->Heroes)) {
                $moves[$moveid]['battlefield']['after']->performCleanup();
                $nextInnerIterator = 0;
            }
            $moves[$moveid]['quantify'] = $moves[$moveid]['battlefield']['after']->performThinkAheadMoves($nextThinkAhead, $nextInnerIterator, $performedUpFar, $originalBattlefield, $tabs."\t", $numberOfCombinations);
        } else $moves[$moveid]['quantify'] = $moves[$moveid]['battlefield']['after']->quantify($originalBattlefield);
    }

    usort($moves, function($a, $b) {
        if ($a['quantify'] === $b['quantify']) return 0;
        else return ($a['quantify'] > $b['quantify']) ? -1 : 1;
    });

    return $moves[0]['quantify'];
}

它的作用是递归地检查未来的移动，直到达到$thinkAheadleft的值，或者直到找到解决方案（即，所有怪物都被击败）为止。到达其退出参数时，它将计算战场状态，与$originalBattlefield（第一步之前的战场状态）进行比较。计算是通过以下方式进行的：

 /** Quantify the current state of the battlefield
 *
 * @param Battlefield $originalBattlefield (the original battlefield)
 *
 * returns int (returns an integer with the battlefield quantification)
 */
public function quantify(Battlefield $originalBattlefield) {

    $points = 0;
    foreach ($originalBattlefield->Monsters as $originalMonsterId => $OriginalMonster) {
        $CurrentMonster = $this->getMonster($originalMonsterId);

        $monsterActivated = $CurrentMonster->getActivations() - $OriginalMonster->getActivations();
        $points+=$monsterActivated*($this->quantifications['activations'] + $this->quantifications['activationsPenalty']);

        if ($CurrentMonster->isDead()) $points+=$this->quantifications['monsterKilled']*$CurrentMonster->Priority;
        else {
            $enragePenalty = floor($this->quantifications['activations'] * (($CurrentMonster->Enrage['max'] - $CurrentMonster->Enrage['left'])/$CurrentMonster->Enrage['max']));

            $points+=($OriginalMonster->Health['left'] - $CurrentMonster->Health['left']) * $this->quantifications['health'];
            $points+=(($CurrentMonster->Enrage['max'] - $CurrentMonster->Enrage['left']))*$enragePenalty;
        }
    }

    return $points;
}

当量化某些事物的净正点时，该状态的某些净负点。 AI所做的是，与其使用在当前举动之后计算出的点数来决定采取哪一步，他使用在超前思考部分之后计算出的点数，并根据其他英雄的可能举动选择一个举动。。

基本上，AI所做的就是说，攻击怪物1目前不是最好的选择，但如果其他英雄长期执行此操作，将是最好的结果。

选择一个动作后，AI与英雄执行一个动作，然后对下一个英雄重复该过程，并以+1动作进行计算。

问题：我的问题是，我认为，“思考超前” 3-4步的AI应该找到比只执行最佳步伐的AI更好的解决方案在这一刻。但是我的测试用例显示的情况有所不同，在某些情况下，不使用提前思考选项的AI（即目前仅发挥最大可能的动作）击败了正在思考仅一步之遥的AI。有时，仅思考3步的AI会击败思考4或5步的AI。为什么会这样呢？我的推定不正确吗？如果是这样，那为什么呢？我使用的重量数字错误吗？我正在对此进行调查，并进行测试，以自动计算要使用的权重，并测试可能的权重区间，并尝试使用最佳结果（即，产生最少匝数和/或最少的激活次数），但是如上所述，这些权重仍然存在。

对于当前版本的脚本，我只能进行5次前瞻性思考，因为任何更大的前瞻性数字，脚本都会变得非常慢（提前5次思考，大约4分钟即可找到解决方案，但是6提前考虑，它甚至没有找到6小时内的第一个可能的举动）

战斗方式：该战斗的工作方式如下：由AI控制的多个英雄（2-4），每个英雄都有许多不同的攻击（1-x），在战斗中可以使用一次或多次，它们正在攻击许多怪物（1-9）。根据攻击值，怪物会失去生命，直到死亡。每次攻击之后，如果被攻击的怪物没有死，就会被激怒，并且在每个英雄执行移动之后，所有的怪物都会被激怒。当怪物达到其愤怒极限时，它们会激活。

免责声明：：我知道PHP并不是用于这种操作的语言，但是由于这只是一个内部项目，因此我宁愿牺牲速度，以便能够用我的本机编程语言尽快编码。

更新：我们当前使用的量化如下所示：

$Battlefield->setQuantification(array(
 'health'                   =>  16,
 'monsterKilled'            =>  86,
 'activations'              =>  -46,
 'activationsPenalty'       =>  -10
));

Answer 1

如果您的游戏中存在随机性，那么任何事情都可能发生。请指出这一点，因为您在此处发布的材料尚不清楚。

如果没有随机性，并且演员可以看到游戏的完整状态，那么更长的绝对可以表现更好。如果没有，则表明您的评估功能提供的状态值估计不正确。

在查看您的代码时，没有列出您的量化值，并且在您的模拟中，您似乎只是让同一位玩家重复执行动作，而不考虑其他参与者的可能动作。您需要逐步进行完整的模拟，以生成准确的未来状态，并且需要查看各种状态的价值估算，以查看是否与它们相符，并相应地调整量化。

解决价值估算问题的另一种方法是，以0.0到1.0的比例明确预测您赢得该回合的机会，然后选择给您带来最大获胜机会的举动。计算到目前为止造成的伤害和杀死的怪物数量并不能告诉您要赢得比赛还有多少要做。

这个AI怎么了？

1 个答案: