Question

我有一系列元素：

$arr = array(
  '0' => 265000, // Area
  '1' => 190000,
  '2' => 30000,
  '3' => 1300
);

我想根据区域（Array值）得到随机索引。我需要更频繁地选择具有大价值的区域。我怎么能这样做？

我现在拥有的内容：

$random_idx = mt_rand(0, count($arr)-1);    
$selected_area = (object)$arr[$random_idx];

谢谢！

Answer 1

<强> 1。重复值

假设我们有一个数组，其中每个值对应于其索引的相对概率。例如，给定一枚硬币，折腾的可能结果是50％的尾巴和50％的头部。我们可以用数组表示那些概率，比如（我将使用PHP，因为这似乎是OP使用的语言）：

$coin = array(     
     'head' => 1,    
     'tails' => 1    
);

虽然滚动两个骰子的结果可以表示为：

$dice = array( '2' => 1, '3' => 2, '4' => 3, '5' => 4, '6' => 5, '7' => 6,
               '8' => 5, '9' => 4, '10' => 3, '11' => 2, '12' => 1
);

一种简单的方法来选择一个随机密钥（索引），其概率与这些数组的值成比例（因此与底层模型一致）是创建另一个数组，其元素是原始数据的键，重复多次时间由值指示，然后返回随机值。例如，对于dice数组：

$arr = array( 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 6, ...

这样做，我们相信每个密钥都会以正确的相对概率获取。我们可以使用构造函数将所有逻辑封装在一个类中，该构造函数构建一个函数，该函数使用mt_rand()返回一个随机索引：

class RandomKeyMultiple {
    private $pool = array();
    private $max_range;

    function __construct( $source ) {
        // build the look-up array
        foreach ( $source as $key => $value ) {
            for ( $i = 0; $i < $value; $i++ ) {
                $this->pool[] = $key;
            }
        }
        $this->max_range = count($this->pool) - 1;
    }

    function get_random_key() {
        $x = mt_rand(0, $this->max_range);

        return $this->pool[$x];     
    }
}

用法很简单，只需创建传递源数组的类的对象，然后函数的每次调用都将返回一个随机键：

$test = new RandomKeyMultiple($dice);
echo $test->get_random_key();

问题是OP的数组包含很大的值，这导致了一个非常大的（但仍然可以管理，甚至没有将所有值除以100）数组。

<强> 2。步骤

一般来说，离散概率分布可能更复杂，浮点值不能轻易转换为重复次数。

解决问题的另一种方法是将数组中的值视为划分所有可能值的全局范围的区间错误：

    +---------------------------+-----------------+-------+----+
    |                           |                 |       |    |
    |<---       265000      --->|<--   190000  -->|<30000>|1300| 
    |<-------            455000            ------>|            |
    |<----------              485000            --------->|    |
    |<----------------            486300        -------------->|

然后我们可以选择0到486300之间的随机数（全局范围）并查找正确的索引（其中几率与其段的长度成正比，给出正确的概率分布）。类似的东西：

$x = mt_rand(0, 486300);
if ( $x < 265000 )
    return 0;
elseif ( $x < 455000 )
    return 1;
elseif ( $x < 485000 )
    return 2;
else
    return 3;

我们可以推广算法并将所有逻辑封装在一个类中（使用辅助数组来存储部分和）：

class RandomKey {
    private $steps = array();
    private $last_key;
    private $max_range;

    function __construct( $source ) {
        // sort in ascending order to partially avoid numerical issues
        asort($source);  

        // calculate the partial sums. Considering OP's array:
        //
        //   1300 ---->       0
        //  30000 ---->    1300
        // 190000 ---->   31300
        // 265000 ---->  221300  endind with $partial = 486300
        //
        $partial = 0;
        $temp = 0;
        foreach ( $source as $k => &$v ) {
            $temp = $v;
            $v = $partial;
            $partial += $temp;
        }

        // scale the steps to cover the entire mt_rand() range
        $factor = mt_getrandmax() / $partial;
        foreach ( $source as $k => &$v ) {
            $v *= $factor;
        }

        // Having the most probably outcomes first, minimizes the look-up of
        // the correct index
        $this->steps = array_reverse($source);

        // remove last element (don't needed during checks) but save the key
        end($this->steps);
        $this->last_key = key($this->steps);
        array_pop($this->steps);
    }

    function get_random_key() {
        $x = mt_rand();

        foreach ( $this->steps as $key => $value ) {
            if ( $x > $value ) {
                return $key;
            }
        }
        return $this->last_key;     
    }

}

Here或here有现场演示，其中包含一些示例和辅助函数，用于检查密钥的概率分布。

对于较大的数组，也可以考虑使用二进制搜索来查找索引。

Answer 2

此解决方案基于元素的索引，而不是基于它的值。所以我们需要对数组进行排序，以确保具有更大值的元素具有更大的索引。

随机索引生成器现在可以表示为线性依赖关系x = y：

(y)

a i     4             +
r n     3          +
r d     2       +
a e     1    +
y x     0 +
          0  1  2  3  4      

          r a n d o m
          n u m b e r   (x)

我们需要非线性地生成指数（更大的指数 - 更多的概率）：

a i     4                               +  +  +  +  +
r n     3                   +  +  +  +
r d     2          +  +  +
a e     1    +  + 
y x     0 +  
          0  1  2  3  4  5  6  7  8  9 10 11 12 13 14

          r a n d o m
          n u m b e r

要查找长度为x的数组的c值范围，我们可以计算范围0..c中所有数字的总和：

(c * (c + 1)) / 2;

查找任何x的{{1}}让我们解决二次方程式

解决了这个问题后我们得到了

y ^ 2 + y - 2 * x = 0;

现在让我们把它们放在一起：

y = (sqrt(8 * x + 1) - 1) / 2;

此解决方案在性能方面最适合大型阵列 - 它不依赖于阵列大小和类型。

Answer 3

您的数组描述了离散概率分布。每个数组值（＆＃39; area＆＃39;或＆＃39; weight＆＃39;）与离散随机变量从数组键范围中取特定值的概率有关。

/**
 * Draw a pseudorandom sample from the given discrete probability distribution.
 * The input array values will be normalized and do not have to sum up to one.
 *
 * @param array $arr Array of samples => discrete probabilities (weights).
 * @return sample
 */
function draw_discrete_sample($arr) {
    $rand = mt_rand(0, array_sum($arr) - 1);
    foreach ($arr as $key => $weight) {
        if (($rand -= $weight) < 0) return $key;
    }
}

如果要支持非整数权重/概率，请将第一行替换为$rand = mt_rand() / mt_getrandmax() * array_sum($arr);。

您可能还想查看类似的问题asked here。如果您只对一小组已知发行版进行抽样感兴趣，我建议使用分析方法outlined by Oleg Mikhailov。

Answer 4

这个问题有点类似于操作系统识别下一个与lottery scheduling一起运行的线程的方式。

这个想法是根据每个区域的大小和数量为每个区域分配一些门票。根据选择的随机数，您可以知道赢得哪张票，从而获胜区。

首先，您需要总结所有区域并找到一个随机数，直至此总数。现在，您只需遍历数组并查找第一个元素，其总计到此点的总和大于随机数。

假设您正在寻找PHP解决方案：

function get_random_index($array) {
    // generate total
    $total = array_sum($array);
    // get a random number in the required range
    $random_number = rand(0, $total-1);
    // temporary sum needed to find the 'winning' area
    $temp_total = 0;
    // this variable helps us identify the winning area
    $current_area_index = 0;

    foreach ($array as $area) {
        // add the area to our temporary total
        $temp_total = $temp_total + $area;

        // check if we already have the right ticket
        if($temp_total > $random) {
            return $current_area_index;
        }
        else {
            // this area didn't win, so check the next one
            $current_area_index++;
        }
    }
}

随机区域

4 个答案: