Question

我想实现以下算法：

如果剩余的Config.MAX_ACTION支柱少于Config.MIN_ACTION条那么选择者必须选择最小数量的棍棒（Config.MAX_ACTION）。
对于剩余的actionRanking或更多棍棒，然后选择基于 actionRanking参数。
0数组为每个可能的操作都有一个元素。 Config.MIN_ACTION index对应Config.MAX_ACTION，最高索引对应到Config.MIN_ACTION。
例如，如果1为Config.MAX_ACTION且3为actionRanking[0]，动作可以是拿起1,2或3支。
1对应actionRanking[1]，2对应Config.MIN_ACTION等。与其他元素相比，动作的元素越高，则越有可能应该选择行动。
首先通过对所有元素值求和来计算可能性总数。然后根据各种排名的相对频率选择特定的动作。
例如，如果1为Config.MAX_ACTION且{9,90,1}为3：如果操作排名为100，则总计为actionRanking[0]。由于9为Config.RNG.nextInt(?)，因此应选择拾取1的操作大约9/100次。 2应选择约90/100次，1应选择约1/100次。
使用sticksRemaining方法生成适当的随机数。
actionRanking表示剩余的木棒数量。
0：要采取的每项行动的计数。 Config.MIN_ACTION索引对应Config.MAX_ACTION，最高索引对应0。
返回要拾取的枝条数量。在以下情况下会返回actionRanking：null为actionRanking，0的长度为sticksRemaining，或<= 0为static int aiChooseAction(int sticksRemaining, int[] actionRanking) { if(actionRanking == null || actionRanking.length == 0 || sticksRemaining <= 0) return 0 ; else if(sticksRemaining < Config.MAX_ACTION) return Config.MIN_ACTION; //TODO change to appropriate value else { int max = Integer.MIN_VALUE; int index = 0 ; for(int i = 0; i < actionRanking.length; i++) { if(actionRanking[i] >= max) { max = actionRanking[i]; index = i ; } } if(sticksRemaining<max) return index+1; else return Config.RNG.nextInt(Config.MAX_ACTION)+Config.MIN_ACTION; } } }。

我编写的代码如下：

private static void testAiChooseAction() {
        boolean error = false;

        // 1.
        int action = Sticks.aiChooseAction(0, null);
        if (action != 0) {
            error = true;
            System.out.println("testAiChooseAction 1: for 0 sticks or null " 
                    + "actionRanking, response should be 0.");
        }

        // 2.
        int[] actionRanking = new int[] { 1, 100, 0 };
        action = Sticks.aiChooseAction(-5, actionRanking);
        if (action != 0) {
            error = true;
            System.out.println("testAiChooseAction 2: for negative sticks," 
                    + " response should be 0.");
        }

        // 3.
        action = Sticks.aiChooseAction(10, actionRanking);
        if (action < Config.MIN_ACTION || action > Config.MAX_ACTION) {
            error = true;
            System.out.println("testAiChooseAction 3: invalid action " 
                    + action);
        }

        // 4.
        // create and initialize to 0 an action ranking array
        actionRanking = new int[NUM_ACTIONS];

        // set the highest index to the highest ranking
        // so we expect the MAX_ACTION to be chosen
        actionRanking[actionRanking.length - 1] = 100;

        action = Sticks.aiChooseAction(10, actionRanking);

        if (action != Config.MAX_ACTION) {
            error = true;
            System.out.println("testAiChooseAction 4: expected " 
                    + Config.MAX_ACTION + " rather than " + action);
        }

        // 5.
        actionRanking = new int[] { 1, 6, 3 }; // test for 3 actions
        int[] responses = new int[actionRanking.length];

        // set seed to get repeatable "random" values
        Config.RNG.setSeed(123); 

        // call a bunch of times so there is reasonable chance of seeing the
        // expected distribution.
        for (int i = 0; i < 10000; i++) {
            action = Sticks.aiChooseAction(10, actionRanking);
            responses[action - Config.MIN_ACTION]++;
        }
        if (responses[0] != 1037 || responses[1] != 5819 
                || responses[2] != 3144) {
            error = true;
            System.out.println("testAiChooseAction 5: for seed 123 "
                    + "responses were expected to be [1037, 5819, 3144] " 
                    + " but found " + Arrays.toString(responses));

        }

        // can you think of other tests that would be useful?
        // if so, then you can add them.

        if (error) {
            System.out.println("testAiChooseAction: failed");
        } else {
            System.out.println("testAiChooseAction: passed");
        }
    }

测试此功能的代码如下：

testAiChooseAction 5: for seed 123 responses were expected to be [1037, 5819, 3144]  but found [3327, 3370, 3303]
testAiChooseAction: failed

但它没有通过测试。错误如下：

transparent mode (SQUID)

如何解决错误？请帮我。

Answer 1

您看到的结果完全是预期的。随机数分布对于您提供的种子是正确的：

Random r = new Random();
r.setSeed(123);
int [] count = { 0, 0, 0 };
for(int i = 0; i < 10000; i++) {
    count[r.nextInt(3)]++;
}
System.out.println(Arrays.toString(count));

产地：

[3327, 3370, 3303]

问题是您的代码不会尝试调整随机响应的权重。你应该改变这个：

return Config.RNG.nextInt(Config.MAX_ACTION)+Config.MIN_ACTION;

并执行以下操作：

在选择索引之前的某个阶段计算所有操作的总权重：

int totalWeight = 0;
for(int i = 0; i < actionRanking.length; i++) {
    totalWeight += actionRanking[i];
}

然后，不是随机选择一个索引，而是随机选择一个权重并确定哪个索引对应于该权重：

int selection = Config.RNG.nextInt(totalWeight)+Config.MIN_ACTION;
int weight = Config.MIN_ACTION;

for(int i = 0; i < actionRanking.length - 1; i++) {
    weight += actionRanking[i];
    if (selection < weight) {
        return i + Config.MIN_ACTION;
    }
}
return Config.MAX_ACTION;

测试显示这更接近您的预期输出，但它不相同（[2959, 5998, 1043]） - 可能是由于使用nextInt（10）而不是nextInt（3）。您可能想要更改计算预期结果的方式。

Answer 2

关键是第7项：

例如，如果Config.MIN_ACTION为1且Config.MAX_ACTION为3：如果操作排名为{9,90,1}，则总计为100。由于actionRanking[0]为9，因此应选择拾取1的操作大约9/100次。 2应选择约90/100次，1应选择约1/100次。

以下是该示例的实施方式：

首先生成0到99之间的随机数（100个可能的值）。
如果随机数小于9，则返回1.否则从随机数中减去9。
如果调整后的随机数小于90，则返回2.否则从调整后的随机数中减去90。
剩下的唯一可能性是调整后的随机数为0，小于1，所以返回3.

通常，AI函数的伪代码（在开头的特殊情况之后）应该如下所示：

compute the 'sum' of the entries in the 'actionRanking' array
generate a random number `R` between '0' and 'sum-1' inclusive
for each entry in 'actionRanking'
   if the entry is greater than 'R'
      return 'Config.MIN_ACTION' + the index for that entry
   otherwise
      subtract the entry from 'R'

我的代码显示错误

2 个答案: