基于百分比加权的选择

时间:2010-09-07 02:52:10

标签: c# python algorithm random

我有一组值,以及每个值的相关百分比:

a:70%的几率 b:20%的几率 c:10%的几率

我想根据给定的百分比机会选择一个值(a,b,c)。

我该如何处理?


到目前为止我的尝试看起来像这样:

r = random.random()
if r <= .7:
    return a
elif r <= .9:
    return b
else: 
    return c

我很难想出一个算法来处理这个问题。我该如何处理这个问题,以便它可以处理更大的值集,而不需要将if-else流链接在一起。


(伪代码中的任何解释或答案都没问题。一个python或C#实现会特别有用)

13 个答案:

答案 0 :(得分:36)

以下是C#中的完整解决方案:

public class ProportionValue<T>
{
    public double Proportion { get; set; }
    public T Value { get; set; }
}

public static class ProportionValue
{
    public static ProportionValue<T> Create<T>(double proportion, T value)
    {
        return new ProportionValue<T> { Proportion = proportion, Value = value };
    }

    static Random random = new Random();
    public static T ChooseByRandom<T>(
        this IEnumerable<ProportionValue<T>> collection)
    {
        var rnd = random.NextDouble();
        foreach (var item in collection)
        {
            if (rnd < item.Proportion)
                return item.Value;
            rnd -= item.Proportion;
        }
        throw new InvalidOperationException(
            "The proportions in the collection do not add up to 1.");
    }
}

<强>用法:

var list = new[] {
    ProportionValue.Create(0.7, "a"),
    ProportionValue.Create(0.2, "b"),
    ProportionValue.Create(0.1, "c")
};

// Outputs "a" with probability 0.7, etc.
Console.WriteLine(list.ChooseByRandom());

答案 1 :(得分:9)

对于Python:

>>> import random
>>> dst = 70, 20, 10
>>> vls = 'a', 'b', 'c'
>>> picks = [v for v, d in zip(vls, dst) for _ in range(d)]
>>> for _ in range(12): print random.choice(picks),
... 
a c c b a a a a a a a a
>>> for _ in range(12): print random.choice(picks),
... 
a c a c a b b b a a a a
>>> for _ in range(12): print random.choice(picks),
... 
a a a a c c a c a a c a
>>> 

一般想法:列出每个项目重复次数的列表,与其应有的概率成比例;使用random.choice随机选择一个(统一),这将符合您所需的概率分布。如果您的概率以特殊方式表达,则可能有点浪费内存(例如,70, 20, 10生成100个项目列表,其中7, 2, 1将列出仅具有完全相同行为的10个项目),但如果你认为在你的特定应用场景中这可能是一个大问题,你可以将概率列表中的所有计数除以它们最大的公因子。

除了内存消耗问题,这应该是最快的解决方案 - 每个所需的输出结果只生成一个随机数,并且从该随机数中找到最快的查找,没有比较&amp; c。如果您可能的概率非常奇怪(例如,浮点数需要与许多有效数字匹配),其他方法可能更可取; - )。

答案 2 :(得分:8)

Knuth引用Walker的别名方法。在此搜索,我找到了http://code.activestate.com/recipes/576564-walkers-alias-method-for-random-objects-with-diffe/http://prxq.wordpress.com/2006/04/17/the-alias-method/。这给出了使用线性时间进行设置所产生的每个数字的恒定时间所需的确切概率(奇怪的是,如果您完全使用Knuth所描述的方法,则可以设置n log n时间,这可以避免进行预备排序)。

答案 3 :(得分:6)

获取列表并找到累计权重总数:70,70 + 20,70 + 20 + 10。选择一个大于或等于零且小于总数的随机数。迭代这些项目并返回权重的累积总和大于此随机数的第一个值:

def select( values ):
    variate = random.random() * sum( values.values() )
    cumulative = 0.0
    for item, weight in values.items():
        cumulative += weight
        if variate < cumulative:
            return item
    return item # Shouldn't get here, but just in case of rounding...

print select( { "a": 70, "b": 20, "c": 10 } )

如果实施的话,这个解决方案也应该能够处理加权到任何数字的分数权重和权重,只要它们都是非负数。

答案 4 :(得分:3)

  1. 设T =所有项目权重之和
  2. 设R = 0到T
  3. 之间的随机数
  4. 迭代项目列表,从R中减去每个项目权重,并返回导致结果变为&lt; = 0的项目。

答案 5 :(得分:3)

def weighted_choice(probabilities):
    random_position = random.random() * sum(probabilities)
    current_position = 0.0
    for i, p in enumerate(probabilities):
        current_position += p
        if random_position < current_position:
            return i
    return None

因为random.random将始终返回&lt; 1.0,永远不应该达到最终的return

答案 6 :(得分:2)

import random

def selector(weights):
    i=random.random()*sum(x for x,y in weights)
    for w,v in weights:
        if w>=i:
            break
        i-=w
    return v

weights = ((70,'a'),(20,'b'),(10,'c'))
print [selector(weights) for x in range(10)] 

它对分数权重同样有效

weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
print [selector(weights) for x in range(10)] 

如果您有批次权重,可以使用bisect减少所需的迭代次数

import random
import bisect

def make_acc_weights(weights):
    acc=0
    acc_weights = []
    for w,v in weights:
        acc+=w
        acc_weights.append((acc,v))
    return acc_weights

def selector(acc_weights):
    i=random.random()*sum(x for x,y in weights)
    return weights[bisect.bisect(acc_weights, (i,))][1]

weights = ((70,'a'),(20,'b'),(10,'c'))
acc_weights = make_acc_weights(weights)    
print [selector(acc_weights) for x in range(100)]

也适用于分数权重

weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
acc_weights = make_acc_weights(weights)    
print [selector(acc_weights) for x in range(100)]

答案 7 :(得分:2)

今天,the update of python document给出了一个用加权概率制作random.choice()的例子:

如果权重是小整数比,一种简单的技术是建立一个带重复的样本群:

>>> weighted_choices = [('Red', 3), ('Blue', 2), ('Yellow', 1), ('Green', 4)]
>>> population = [val for val, cnt in weighted_choices for i in range(cnt)]
>>> random.choice(population)
'Green'

更通用的方法是使用itertools.accumulate()将权重排列在累积分布中,然后使用bisect.bisect()定位随机值:

>>> choices, weights = zip(*weighted_choices)
>>> cumdist = list(itertools.accumulate(weights))
>>> x = random.random() * cumdist[-1]
>>> choices[bisect.bisect(cumdist, x)]
'Blue'

一个注释:itertools.accumulate() needs python 3.2 or define it with the Equivalent.

答案 8 :(得分:1)

我认为你可以有一个小对象数组(我用Java实现,虽然我知道一点点C#,但我担心会编写错误的代码),所以你可能需要自己移植它。使用struct,var,C#中的代码会小得多,但我希望你能得到这个想法

class PercentString {
  double percent;
  String value;
  // Constructor for 2 values
}

ArrayList<PercentString> list = new ArrayList<PercentString();
list.add(new PercentString(70, "a");
list.add(new PercentString(20, "b");
list.add(new PercentString(10, "c");

double percent = 0;
for (int i = 0; i < list.size(); i++) {
  PercentString p = list.get(i);
  percent += p.percent;
  if (random < percent) {
    return p.value;
  }
}

答案 9 :(得分:1)

我自己的解决方案:

public class Randomizator3000 
{    
public class Item<T>
{
    public T value;
    public float weight;

    public static float GetTotalWeight<T>(Item<T>[] p_itens)
    {
        float __toReturn = 0;
        foreach(var item in p_itens)
        {
            __toReturn += item.weight;
        }

        return __toReturn;
    }
}

private static System.Random _randHolder;
private static System.Random _random
{
    get 
    {
        if(_randHolder == null)
            _randHolder = new System.Random();

        return _randHolder;
    }
}

public static T PickOne<T>(Item<T>[] p_itens)
{   
    if(p_itens == null || p_itens.Length == 0)
    {
        return default(T);
    }

    float __randomizedValue = (float)_random.NextDouble() * (Item<T>.GetTotalWeight(p_itens));
    float __adding = 0;
    for(int i = 0; i < p_itens.Length; i ++)
    {
        float __cacheValue = p_itens[i].weight + __adding;
        if(__randomizedValue <= __cacheValue)
        {
            return p_itens[i].value;
        }

        __adding = __cacheValue;
    }

    return p_itens[p_itens.Length - 1].value;

}
}

使用它应该是那样的(在Unity3d中)

using UnityEngine;
using System.Collections;

public class teste : MonoBehaviour 
{
Randomizator3000.Item<string>[] lista;

void Start()
{
    lista = new Randomizator3000.Item<string>[10];
    lista[0] = new Randomizator3000.Item<string>();
    lista[0].weight = 10;
    lista[0].value = "a";

    lista[1] = new Randomizator3000.Item<string>();
    lista[1].weight = 10;
    lista[1].value = "b";

    lista[2] = new Randomizator3000.Item<string>();
    lista[2].weight = 10;
    lista[2].value = "c";

    lista[3] = new Randomizator3000.Item<string>();
    lista[3].weight = 10;
    lista[3].value = "d";

    lista[4] = new Randomizator3000.Item<string>();
    lista[4].weight = 10;
    lista[4].value = "e";

    lista[5] = new Randomizator3000.Item<string>();
    lista[5].weight = 10;
    lista[5].value = "f";

    lista[6] = new Randomizator3000.Item<string>();
    lista[6].weight = 10;
    lista[6].value = "g";

    lista[7] = new Randomizator3000.Item<string>();
    lista[7].weight = 10;
    lista[7].value = "h";

    lista[8] = new Randomizator3000.Item<string>();
    lista[8].weight = 10;
    lista[8].value = "i";

    lista[9] = new Randomizator3000.Item<string>();
    lista[9].weight = 10;
    lista[9].value = "j";
}


void Update () 
{
    Debug.Log(Randomizator3000.PickOne<string>(lista));
}
}

在此示例中,每个值都有10%的机会显示为debug = 3

答案 10 :(得分:0)

如果你真的很快并希望快速生成随机值,https://stackoverflow.com/a/3655773/1212517中提到的Walker算法mcdowella几乎是最好的方法(O(1)随机()的时间,并且预处理的O(N)时间())。

对于任何感兴趣的人,这是我自己的算法的PHP实现:

/**
 * Pre-process the samples (Walker's alias method).
 * @param array key represents the sample, value is the weight
 */
protected function preprocess($weights){

    $N = count($weights);
    $sum = array_sum($weights);
    $avg = $sum / (double)$N;

    //divide the array of weights to values smaller and geq than sum/N 
    $smaller = array_filter($weights, function($itm) use ($avg){ return $avg > $itm;}); $sN = count($smaller); 
    $greater_eq = array_filter($weights, function($itm) use ($avg){ return $avg <= $itm;}); $gN = count($greater_eq);

    $bin = array(); //bins

    //we want to fill N bins
    for($i = 0;$i<$N;$i++){
        //At first, decide for a first value in this bin
        //if there are small intervals left, we choose one
        if($sN > 0){  
            $choice1 = each($smaller); 
            unset($smaller[$choice1['key']]);
            $sN--;
        } else{  //otherwise, we split a large interval
            $choice1 = each($greater_eq); 
            unset($greater_eq[$choice1['key']]);
        }

        //splitting happens here - the unused part of interval is thrown back to the array
        if($choice1['value'] >= $avg){
            if($choice1['value'] - $avg >= $avg){
                $greater_eq[$choice1['key']] = $choice1['value'] - $avg;
            }else if($choice1['value'] - $avg > 0){
                $smaller[$choice1['key']] = $choice1['value'] - $avg;
                $sN++;
            }
            //this bin comprises of only one value
            $bin[] = array(1=>$choice1['key'], 2=>null, 'p1'=>1, 'p2'=>0);
        }else{
            //make the second choice for the current bin
            $choice2 = each($greater_eq);
            unset($greater_eq[$choice2['key']]);

            //splitting on the second interval
            if($choice2['value'] - $avg + $choice1['value'] >= $avg){
                $greater_eq[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
            }else{
                $smaller[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
                $sN++;
            }

            //this bin comprises of two values
            $choice2['value'] = $avg - $choice1['value'];
            $bin[] = array(1=>$choice1['key'], 2=>$choice2['key'],
                           'p1'=>$choice1['value'] / $avg, 
                           'p2'=>$choice2['value'] / $avg);
        }
    }

    $this->bins = $bin;
}

/**
 * Choose a random sample according to the weights.
 */
public function random(){
    $bin = $this->bins[array_rand($this->bins)];
    $randValue = (lcg_value() < $bin['p1'])?$bin[1]:$bin[2];        
}

答案 11 :(得分:0)

这是我的版本,可以应用于任何IList并规范权重。它基于Timwi的解决方案:selection based on percentage weighting

/// <summary>
/// return a random element of the list or default if list is empty
/// </summary>
/// <param name="e"></param>
/// <param name="weightSelector">
/// return chances to be picked for the element. A weigh of 0 or less means 0 chance to be picked.
/// If all elements have weight of 0 or less they all have equal chances to be picked.
/// </param>
/// <returns></returns>
public static T AnyOrDefault<T>(this IList<T> e, Func<T, double> weightSelector)
{
    if (e.Count < 1)
        return default(T);
    if (e.Count == 1)
        return e[0];
    var weights = e.Select(o => Math.Max(weightSelector(o), 0)).ToArray();
    var sum = weights.Sum(d => d);

    var rnd = new Random().NextDouble();
    for (int i = 0; i < weights.Length; i++)
    {
        //Normalize weight
        var w = sum == 0
            ? 1 / (double)e.Count
            : weights[i] / sum;
        if (rnd < w)
            return e[i];
        rnd -= w;
    }
    throw new Exception("Should not happen");
}

答案 12 :(得分:0)

宽松地基于python的numpy.random.choice(a=items, p=probs),它需要一个数组和一个大小相同的概率数组。

    public T RandomChoice<T>(IEnumerable<T> a, IEnumerable<double> p)
    {
        IEnumerator<T> ae = a.GetEnumerator();
        Random random = new Random();
        double target = random.NextDouble();
        double accumulator = 0;
        foreach (var prob in p)
        {
            ae.MoveNext();
            accumulator += prob;
            if (accumulator > target)
            {
                break;
            }
        }
        return ae.Current;
    }

概率数组p的总和必须为(大约)1。这是为了使其与numpy接口(和数学)保持一致,但是您可以根据需要轻松更改它。