Question

我有一个包含100,000个对象的列表。每个列表元素都有一个与之关联的“权重”，它是从1到N的正整数。

从列表中选择随机元素的最有效方法是什么？我想要随机选择的元素的分布与列表中的权重分布相同的行为。

例如，如果我有一个列表L = {1,1,2,5}，我希望第4个元素平均被选择为5/9。

假设插入和删除在此列表中很常见，因此任何使用“积分区域表”的方法都需要经常更新 - 希望有一个O（1）运行时和O（1）额外内存所需的解决方案。 / p>

Answer 1

您可以使用扩充二进制搜索树来存储元素，以及每个子树中权重的总和。这允许您根据需要插入和删除元素和权重。采样和更新都需要每次操作O（lg n）时间，空间使用量为O（n）。

通过在[1，S]中生成随机整数来完成采样，其中S是所有权重的总和（S存储在树的根部），并使用为每个权重存储的权重和执行二进制搜索子树。

Answer 2

我真的很喜欢jonderry的解决方案，但我想知道这个问题是否需要像增强二叉搜索树那样复杂的结构。如果我们保留两个数组，一个带有输入权重，比如a = {1,1,2,5}，一个带有累积权重（与jonderry解决方案非常相似），那将是b = {1,2,4 ，9}。现在在[1 9]（比如x）中生成一个随机数，并在累积和数组中对其进行二进制搜索。注意到位置i，其中b [i]＆lt; = x并且b [i-1]＆gt; x被注意并且返回a [i]。因此，如果随机数为3，我们将得到i = 3，并且将返回[3] = 2。这确保了与增强树解决方案相同的复杂性，并且更容易实现。

Answer 3

在O（n）中运行的解决方案是从选择第一个元素开始。然后为每个后续元素保留您拥有的元素或将其替换为下一个元素。设w是目前为止所考虑的元素的所有权重之和。然后保持旧概率为w /（w + x）并选择新的p = x /（w + x），其中x是下一个元素的权重。

Answer 4

这就是我解决它的方法：

def rchoose(list1, weights):
    '''
    list1   :    list of elements you're picking from.
    weights :    list of weights. Has to be in the same order as the 
                 elements of list1. It can be given as the number of counts 
                 or as a probability.
    '''

    import numpy as np

    # normalizing the weights list
    w_sum = sum(weights)
    weights_normalized = []
    for w in weights:
        weights_normalized.append(w/w_sum)

    # sorting the normalized weights and the desired list simultaneously
    weights_normalized, list1 = zip(*sorted(zip(weights_normalized, list1)))

    # bringing the sorted tuples back to being lists
    weights_normalized = list(weights_normalized)
    list1 = list(list1)

    # finalizing the weight normalization
    dummy = []; count = 0
    for item in weights_normalized:
        count += item
        dummy.append(count)
    weights_normalized = dummy

    # testing which interval the uniform random number falls in
    random_number = np.random.uniform(0, 1)
    for idx, w in enumerate(weights_normalized[:-1]):
        if random_number <= w:
            return list1[idx]

    return list1[-1]

Answer 5

如果您知道权重总和（在您的情况下，9） AND 您使用随机访问数据结构（列表意味着O（n）访问时间），那么它可以快速完成：

1）选择随机元素（O（1））。由于在此步骤中选择元素的可能性为1/num_elems，因此我们可以在步骤2中使用num_elems*增强，从而加速算法。

2）计算其预期概率：num_elems * (weight/total_weight)

3）取0~1范围内的随机数，如果它小于预期概率，则得到输出。如果没有，请从步骤1）重复

从加权列表中随机选择一个元素

5 个答案: