random.choice的加权版本

时间:2010-09-09 18:59:23

标签: python optimization

我需要编写random.choice的加权版本(列表中的每个元素都有不同的被选中概率)。这就是我想出的:

def weightedChoice(choices):
    """Like random.choice, but each element can have a different chance of
    being selected.

    choices can be any iterable containing iterables with two items each.
    Technically, they can have more than two items, the rest will just be
    ignored.  The first item is the thing being chosen, the second item is
    its weight.  The weights can be any numeric values, what matters is the
    relative differences between them.
    """
    space = {}
    current = 0
    for choice, weight in choices:
        if weight > 0:
            space[current] = choice
            current += weight
    rand = random.uniform(0, current)
    for key in sorted(space.keys() + [current]):
        if rand < key:
            return choice
        choice = space[key]
    return None

这个功能对我来说似乎过于复杂,而且很难看。我希望这里的每个人都可以提出改进建议或其他方法。效率对我来说并不像代码清洁度和可读性那么重要。

26 个答案:

答案 0 :(得分:231)

从版本1.7.0开始,NumPy具有支持概率分布的choice函数。

from numpy.random import choice
draw = choice(list_of_candidates, number_of_items_to_pick,
              p=probability_distribution)

请注意,probability_distribution是一个序列,其顺序与list_of_candidates相同。您还可以使用关键字replace=False来更改行为,以便不替换绘制的项目。

答案 1 :(得分:133)

def weighted_choice(choices):
   total = sum(w for c, w in choices)
   r = random.uniform(0, total)
   upto = 0
   for c, w in choices:
      if upto + w >= r:
         return c
      upto += w
   assert False, "Shouldn't get here"

答案 2 :(得分:124)

由于 Python3.6 choices模块中有一个方法random

Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.0.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import random

In [2]: random.choices(
...:     population=[['a','b'], ['b','a'], ['c','b']],
...:     weights=[0.2, 0.2, 0.6],
...:     k=10
...: )

Out[2]:
[['c', 'b'],
 ['c', 'b'],
 ['b', 'a'],
 ['c', 'b'],
 ['c', 'b'],
 ['b', 'a'],
 ['c', 'b'],
 ['b', 'a'],
 ['c', 'b'],
 ['c', 'b']]

人们还提到有numpy.random.choice支持权重,它不支持 2d数组,依此类推。

所以,如果你有 3.6.x Python (参见更新)内置random.choices >

<强>更新: 正如@roganjosh所提到的那样,random.choices无法在没有替换的情况下返回值,正如docs中提到的那样:

  

返回从人口中选择<{1}}大小的元素列表

@ronan-paixão的精彩回答表明numpy.choicek参数,可以控制这种行为。

答案 3 :(得分:67)

  1. 将重量排列成一个 累积分布。
  2. 使用 random.random()来随机选择 漂浮0.0 <= x < total
  3. 搜索 使用 bisect.bisect 作为分发 在http://docs.python.org/dev/library/bisect.html#other-examples的示例中显示。
  4. from random import random
    from bisect import bisect
    
    def weighted_choice(choices):
        values, weights = zip(*choices)
        total = 0
        cum_weights = []
        for w in weights:
            total += w
            cum_weights.append(total)
        x = random() * total
        i = bisect(cum_weights, x)
        return values[i]
    
    >>> weighted_choice([("WHITE",90), ("RED",8), ("GREEN",2)])
    'WHITE'
    

    如果您需要进行多项选择,请将其拆分为两个函数,一个用于构建累积权重,另一个用于平分为随机点。

答案 4 :(得分:18)

如果您不介意使用numpy,可以使用numpy.random.choice

例如:

import numpy

items  = [["item1", 0.2], ["item2", 0.3], ["item3", 0.45], ["item4", 0.05]
elems = [i[0] for i in items]
probs = [i[1] for i in items]

trials = 1000
results = [0] * len(items)
for i in range(trials):
    res = numpy.random.choice(items, p=probs)  #This is where the item is selected!
    results[items.index(res)] += 1
results = [r / float(trials) for r in results]
print "item\texpected\tactual"
for i in range(len(probs)):
    print "%s\t%0.4f\t%0.4f" % (items[i], probs[i], results[i])

如果你知道需要提前做出多少选择,你可以不用这样的循环来做:

numpy.random.choice(items, trials, p=probs)

答案 5 :(得分:16)

原油,但可能就足够了:

import random
weighted_choice = lambda s : random.choice(sum(([v]*wt for v,wt in s),[]))

有效吗?

# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]

# initialize tally dict
tally = dict.fromkeys(choices, 0)

# tally up 1000 weighted choices
for i in xrange(1000):
    tally[weighted_choice(choices)] += 1

print tally.items()

打印:

[('WHITE', 904), ('GREEN', 22), ('RED', 74)]

假设所有权重都是整数。它们不需要加100,我只是这样做,以使测试结果更容易解释。 (如果权重是浮点数,则将它们全部乘以10,直到所有权重> = 1。)

weights = [.6, .2, .001, .199]
while any(w < 1.0 for w in weights):
    weights = [w*10 for w in weights]
weights = map(int, weights)

答案 6 :(得分:15)

如果你有加权词典而不是列表,你可以写这个

items = { "a": 10, "b": 5, "c": 1 } 
random.choice([k for k in items for dummy in range(items[k])])

请注意,[k for k in items for dummy in range(items[k])]会生成此列表['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'c', 'b', 'b', 'b', 'b', 'b']

答案 7 :(得分:11)

从Python v3.6开始,random.choices可用于从具有可选权重的给定总体返回指定大小的list个元素。

  

random.choices(population, weights=None, *, cum_weights=None, k=1)

  • 人口list包含独特的观察结果。 (如果为空,则引发IndexError

  • 权重:更精确地说,选择所需的相对权重。

  • cum_weights :进行选择所需的累积权重。

  • k :要输出的len的大小(list)。 (默认len()=1

几点注意事项:

1)它使用带有替换的加权采样,以便稍后替换所绘制的项目。权重序列中的值本身并不重要,但它们的相对比率确实如此。

np.random.choice不同,int/float/fraction只能将概率作为权重,并且必须确保个别概率的总和达到1个标准,这里没有这样的规定。只要它们属于数字类型(Decimal>>> import random # weights being integers >>> random.choices(["white", "green", "red"], [12, 12, 4], k=10) ['green', 'red', 'green', 'white', 'white', 'white', 'green', 'white', 'red', 'white'] # weights being floats >>> random.choices(["white", "green", "red"], [.12, .12, .04], k=10) ['white', 'white', 'green', 'green', 'red', 'red', 'white', 'green', 'white', 'green'] # weights being fractions >>> random.choices(["white", "green", "red"], [12/100, 12/100, 4/100], k=10) ['green', 'green', 'white', 'red', 'green', 'red', 'white', 'green', 'green', 'green'] 类型),它们仍会执行。

TypeError

2)如果既没有指定权重也没有 cum_weights ,则选择的概率相等。如果提供了权重序列,则它必须与 population 序列的长度相同。

同时指定权重 cum_weights 会引发>>> random.choices(["white", "green", "red"], k=10) ['white', 'white', 'green', 'red', 'red', 'red', 'white', 'white', 'white', 'green']

weights=[12, 12, 4]

3) cum_weights 通常是itertools.accumulate函数的结果,在这种情况下非常方便。

  

来自链接的文档:

     

在内部,相对权重转换为累积权重   在进行选择之前,因此提供累积权重   工作

因此,为我们的设计案例提供cum_weights=[12, 24, 28]或{{1}}会产生相同的结果,而后者似乎更快/更有效。

答案 8 :(得分:9)

这里是Python 3.6标准库中包含的版本:

import itertools as _itertools
import bisect as _bisect

class Random36(random.Random):
    "Show the code included in the Python 3.6 version of the Random class"

    def choices(self, population, weights=None, *, cum_weights=None, k=1):
        """Return a k sized list of population elements chosen with replacement.

        If the relative weights or cumulative weights are not specified,
        the selections are made with equal probability.

        """
        random = self.random
        if cum_weights is None:
            if weights is None:
                _int = int
                total = len(population)
                return [population[_int(random() * total)] for i in range(k)]
            cum_weights = list(_itertools.accumulate(weights))
        elif weights is not None:
            raise TypeError('Cannot specify both weights and cumulative weights')
        if len(cum_weights) != len(population):
            raise ValueError('The number of weights does not match the population')
        bisect = _bisect.bisect
        total = cum_weights[-1]
        return [population[bisect(cum_weights, random() * total)] for i in range(k)]

来源:https://hg.python.org/cpython/file/tip/Lib/random.py#l340

答案 9 :(得分:4)

我要求选择的总和是1,但这无论如何都适用

def weightedChoice(choices):
    # Safety check, you can remove it
    for c,w in choices:
        assert w >= 0


    tmp = random.uniform(0, sum(c for c,w in choices))
    for choice,weight in choices:
        if tmp < weight:
            return choice
        else:
            tmp -= weight
     raise ValueError('Negative values in input')

答案 10 :(得分:3)

一种非常简单的加权选择方法如下:

np.random.choice(['A', 'B', 'C'], p=[0.3, 0.4, 0.3])

答案 11 :(得分:2)

import numpy as np
w=np.array([ 0.4,  0.8,  1.6,  0.8,  0.4])
np.random.choice(w, p=w/sum(w))

答案 12 :(得分:2)

我可能来不及提供任何有用的东西,但这是一个简单,简短且非常有效的代码段:

def choose_index(probabilies):
    cmf = probabilies[0]
    choice = random.random()
    for k in xrange(len(probabilies)):
        if choice <= cmf:
            return k
        else:
            cmf += probabilies[k+1]

无需对您的概率进行排序或使用cmf创建向量,并在找到其选择后终止。内存:O(1),时间:O(N),平均运行时间~N / 2。

如果你有权重,只需添加一行:

def choose_index(weights):
    probabilities = weights / sum(weights)
    cmf = probabilies[0]
    choice = random.random()
    for k in xrange(len(probabilies)):
        if choice <= cmf:
            return k
        else:
            cmf += probabilies[k+1]

答案 13 :(得分:1)

这取决于您想要对分配进行采样的次数。

假设您要对分布进行K次采样。然后,当Graph API是分布中的项目数时,每次使用np.random.choice()的时间复杂度为O(K(n + log(n)))

在我的情况下,我需要多次采样相同的分布,其次数为10 ^ 3,其中n的大小为10 ^ 6。我使用下面的代码,它预先计算累积分布并在n中对其进行采样。总体时间复杂度为O(log(n))

O(n+K*log(n))

答案 14 :(得分:1)

如果您碰巧拥有Python 3,并且害怕安装numpy或编写自己的循环,则可以执行以下操作:

import itertools, bisect, random

def weighted_choice(choices):
   weights = list(zip(*choices))[1]
   return choices[bisect.bisect(list(itertools.accumulate(weights)),
                                random.uniform(0, sum(weights)))][0]

因为您可以使用一袋管道适配器来构建任何东西!尽管...我必须承认,内德的回答虽然稍长,但更容易理解。

答案 15 :(得分:1)

如果您的加权选择列表相对静态,并且您想要频繁采样,则可以执行一个O(N)预处理步骤,然后使用this related answer中的函数在O(1)中进行选择

# run only when `choices` changes.
preprocessed_data = prep(weight for _,weight in choices)

# O(1) selection
value = choices[sample(preprocessed_data)][0]

答案 16 :(得分:1)

一般解决方案:

import random
def weighted_choice(choices, weights):
    total = sum(weights)
    treshold = random.uniform(0, total)
    for k, weight in enumerate(weights):
        total -= weight
        if total < treshold:
            return choices[k]

答案 17 :(得分:0)

我查看了指向的其他线程并在我的编码风格中提出了这种变化,这会返回用于计数的选择索引,但返回字符串很简单(注释返回替代):

import random
import bisect

try:
    range = xrange
except:
    pass

def weighted_choice(choices):
    total, cumulative = 0, []
    for c,w in choices:
        total += w
        cumulative.append((total, c))
    r = random.uniform(0, total)
    # return index
    return bisect.bisect(cumulative, (r,))
    # return item string
    #return choices[bisect.bisect(cumulative, (r,))][0]

# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]

tally = [0 for item in choices]

n = 100000
# tally up n weighted choices
for i in range(n):
    tally[weighted_choice(choices)] += 1

print([t/sum(tally)*100 for t in tally])

答案 18 :(得分:0)

一种方法是随机化所有权重的总和,然后使用这些值作为每个var的限制点。这是一个粗略的实现作为生成器。

def rand_weighted(weights):
    """
    Generator which uses the weights to generate a
    weighted random values
    """
    sum_weights = sum(weights.values())
    cum_weights = {}
    current_weight = 0
    for key, value in sorted(weights.iteritems()):
        current_weight += value
        cum_weights[key] = current_weight
    while True:
        sel = int(random.uniform(0, 1) * sum_weights)
        for key, value in sorted(cum_weights.iteritems()):
            if sel < value:
                break
        yield key

答案 19 :(得分:0)

使用numpy

def choice(items, weights):
    return items[np.argmin((np.cumsum(weights) / sum(weights)) < np.random.rand())]

答案 20 :(得分:0)

我需要快速,非常简单地做这样的事情,从寻找想法开始,我终于建立了这个模板。这个想法是从api接收json形式的加权值,这里是由dict模拟的。

然后将其转换为一个列表,其中每个值均按其权重成比例地重复,只需使用random.choice从列表中选择一个值即可。

我尝试运行10、100和1000次迭代。分布似乎很稳定。

def weighted_choice(weighted_dict):
    """Input example: dict(apples=60, oranges=30, pineapples=10)"""
    weight_list = []
    for key in weighted_dict.keys():
        weight_list += [key] * weighted_dict[key]
    return random.choice(weight_list)

答案 21 :(得分:0)

另一种方法,假设我们的权重与元素数组中的元素的索引相同。

import numpy as np
weights = [0.1, 0.3, 0.5] #weights for the item at index 0,1,2
# sum of weights should be <=1, you can also divide each weight by sum of all weights to standardise it to <=1 constraint.
trials = 1 #number of trials
num_item = 1 #number of items that can be picked in each trial
selected_item_arr = np.random.multinomial(num_item, weights, trials)
# gives number of times an item was selected at a particular index
# this assumes selection with replacement
# one possible output
# selected_item_arr
# array([[0, 0, 1]])
# say if trials = 5, the the possible output could be 
# selected_item_arr
# array([[1, 0, 0],
#   [0, 0, 1],
#   [0, 0, 1],
#   [0, 1, 0],
#   [0, 0, 1]])

现在让我们假设,我们必须在1个试验中抽样3个项目。您可以假设存在三个球R,G,B,它们的重量比由权重数组给出的权重比例大,可能出现以下结果:

num_item = 3
trials = 1
selected_item_arr = np.random.multinomial(num_item, weights, trials)
# selected_item_arr can give output like :
# array([[1, 0, 2]])

您还可以将要选择的项目数视为一组中的二项式/多项式试验数。因此,上面的示例仍然可以像

num_binomial_trial = 5
weights = [0.1,0.9] #say an unfair coin weights for H/T
num_experiment_set = 1
selected_item_arr = np.random.multinomial(num_binomial_trial, weights, num_experiment_set)
# possible output
# selected_item_arr
# array([[1, 4]])
# i.e H came 1 time and T came 4 times in 5 binomial trials. And one set contains 5 binomial trails.

答案 22 :(得分:0)

Sebastien Thurn在免费的Udacity机器人技术课程AI中对此进行了演讲。基本上,他使用mod运算符%制作索引权重的圆形数组,将变量beta设置为0,随机选择一个索引, for循环通过N,其中N是索引数,并且在for循环中,首先通过以下公式使beta递增:

beta = beta + {0 ... 2 * Weight_max}中的均匀样本

,然后嵌套在for循环中,下面是一个while循环:

while w[index] < beta:
    beta = beta - w[index]
    index = index + 1

select p[index]

然后转到下一个索引,以根据概率(或本课程中提出的情况下的归一化概率)重新采样。

讲座链接:https://classroom.udacity.com/courses/cs373/lessons/48704330/concepts/487480820923

我已经用我的学校帐户登录了Udacity,因此,如果该链接不起作用,那就是第8课,机器人人工智能的视频号码21,他在讲授粒子过滤器。

答案 23 :(得分:0)

这是另一个使用numpy的weighted_choice版本。传入权重向量,它将返回一个0的数组,其中包含一个1,表示选择了哪个bin。代码默认只进行一次绘制,但您可以传入要进行的绘制数量,并返回每个绘制的数据库。

如果权重向量不总和为1,它将被标准化,以便它可以。

import numpy as np

def weighted_choice(weights, n=1):
    if np.sum(weights)!=1:
        weights = weights/np.sum(weights)

    draws = np.random.random_sample(size=n)

    weights = np.cumsum(weights)
    weights = np.insert(weights,0,0.0)

    counts = np.histogram(draws, bins=weights)
    return(counts[0])

答案 24 :(得分:-1)

我不喜欢其中任何一个的语法。我真的只想指定项目是什么,每个项目的权重是什么。我意识到我本可以使用random.choices,但我很快在下面编写了该类。

import random, string
from numpy import cumsum

class randomChoiceWithProportions:
    '''
    Accepts a dictionary of choices as keys and weights as values. Example if you want a unfair dice:


    choiceWeightDic = {"1":0.16666666666666666, "2": 0.16666666666666666, "3": 0.16666666666666666
    , "4": 0.16666666666666666, "5": .06666666666666666, "6": 0.26666666666666666}
    dice = randomChoiceWithProportions(choiceWeightDic)

    samples = []
    for i in range(100000):
        samples.append(dice.sample())

    # Should be close to .26666
    samples.count("6")/len(samples)

    # Should be close to .16666
    samples.count("1")/len(samples)
    '''
    def __init__(self, choiceWeightDic):
        self.choiceWeightDic = choiceWeightDic
        weightSum = sum(self.choiceWeightDic.values())
        assert weightSum == 1, 'Weights sum to ' + str(weightSum) + ', not 1.'
        self.valWeightDict = self._compute_valWeights()

    def _compute_valWeights(self):
        valWeights = list(cumsum(list(self.choiceWeightDic.values())))
        valWeightDict = dict(zip(list(self.choiceWeightDic.keys()), valWeights))
        return valWeightDict

    def sample(self):
        num = random.uniform(0,1)
        for key, val in self.valWeightDict.items():
            if val >= num:
                return key

答案 25 :(得分:-1)

为random.choice()提供预加权列表:

解决方案和测试:

import random

options = ['a', 'b', 'c', 'd']
weights = [1, 2, 5, 2]

weighted_options = [[opt]*wgt for opt, wgt in zip(options, weights)]
weighted_options = [opt for sublist in weighted_options for opt in sublist]
print(weighted_options)

# test

counts = {c: 0 for c in options}
for x in range(10000):
    counts[random.choice(weighted_options)] += 1

for opt, wgt in zip(options, weights):
    wgt_r = counts[opt] / 10000 * sum(weights)
    print(opt, counts[opt], wgt, wgt_r)

输出:

['a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'd', 'd']
a 1025 1 1.025
b 1948 2 1.948
c 5019 5 5.019
d 2008 2 2.008