Question

我正在设计一个竞赛决策系统，要求玩家针对不同的目标。在不同目标上得分的概率各不相同，并且每个目标上的玩家得分越多，在该目标上得分的概率就会降低。玩家的尝试机会有限。

我想到的只有马尔可夫链和博弈论，但我不知道如何实现它们，我想知道是否还有其他数学技巧来最大化我的分数。

我会非常感谢任何指导。

Answer 1

Markov流程：非解决方案

我认为马尔可夫过程不会在这里发挥作用。马尔可夫属性要求过程的未来状态的概率分布仅取决于其当前状态（或过去的有限数量

如果过程的未来状态的条件概率分布（以过去和现在状态为条件）仅取决于当前状态，而不取决于之前的事件序列，则随机过程具有马尔可夫属性。由于击中目标的概率随着每次成功击中而降低，因此过程的未来取决于其过去，因此，您的过程不是马尔可夫。

递归暴力搜索：一个充分的解决方案

解决此问题的一种方法是通过探索搜索树。以下C ++代码描述了操作：

#include <limits>
#include <iostream>
#include <cstdio>
#include <vector>

std::vector<float> ScoreOn(const std::vector<float> &probs, int target){
  std::vector<float> temp = probs; //Copy original array to avoid corrupting it
  temp[target]          *= 0.9;    //Decrease the probability
  return temp;                     //Return the new array
}

std::pair<float,int> Choice(
  const std::vector<float> &probs,
  const std::vector<float> &values,
  int depth
){
  if(depth==0)                      //We gotta cut this off somewhere
    return std::make_pair(0.0,-1);  //Return 0 value at the end of time

  //Any real choice will have a value greater than this
  float valmax = -std::numeric_limits<float>::infinity();
  //Will shortly be filled with a real choice value
  int choice = -1;

  //Loop through all targets
  for(int t=0;t<probs.size();t++){
    float hit_value  = values[t]+Choice(ScoreOn(probs,t),values,depth-1).first;
    float miss_value = 0        +Choice(probs           ,values,depth-1).first;
    float val        = probs[t]*hit_value+(1-probs[t])*miss_value;
    if(val>valmax){ //Is current target a better choice?
      valmax = val;
      choice = t;
    }
  }
  return std::make_pair(valmax,choice);
}

int main(){
  //Generate sample data and print the current best answer
  int target_count = 8; //Number of targets
  std::vector<float> probs,values;
  for(int t=0;t<target_count;t++){
    probs.push_back(rand()/(float)RAND_MAX);
    values.push_back(80.0*(rand()/(float)RAND_MAX));
  }

  std::cout<<Choice(probs,values,6).first<<std::endl;
}

现在，考虑尝试击中目标。如果我们点击它，我们的行动的价值（称之为 H ）是目标的价值加上我们所有未来行动的价值。如果我们错过（ M ），那么该值为零加上我们将来所有操作的值。我们通过每次发生的概率 p 对这些值进行加权，得到等式：

值= pH +（1- p ） M

我们以相同的方式计算未来值，从而生成递归函数。由于这可以永远持续下去，我们将递归的深度限制在某些级别。因为，在每个级别，决策树沿着t个目标的每个路径分割，我们在树中有(2t)**(Depth+1)-1个节点。因此，明智地选择你的深度以避免永远思考。

上面的代码，在我的Intel i5 M480 cpu（现在大约五年前）中，深度= 5时，优化次数为0.044s，深度= 6时为0.557s。对于深度= 6，树中有268,435,455个节点，每个叶子可能性仅有一个在16,777,216的实现机会。除非你的价值函数是怪异的，否则没有必要考虑未来的距离。

分支与界限：改进的解决方案

但是，如果你确实需要探索更大的空间或更快，你可以考虑使用Branch and Bound methods。这种方式的工作原理相同，只是我们选择不扩展任何可证明小于我们已经找到的解决方案的子树。证明紧张的上限然后成为主要的挑战。

Answer 2

为什么不使用贪心算法？

你不能在每个时间点做得比选择具有最高期望值的目标（命中概率乘以目标值）更好。

如何在比赛中优化得分

2 个答案: