Question

我想在python中模拟random.sample()的功能，但是选择的非均匀（在这种情况下是三角形）分布。对此重要的是，未选择单个项目两次（如random.sample docs中所述）。这就是我所拥有的：

...

def tri_sample(population, k, mode=0):
    """
    Mimics the functionality of random.sample() but with a triangular
    distribution over the length of the sequence.

    Mode defaults to 0, which favors lower indices.
    """
    psize = len(population)
    if k > psize:
        raise ValueError("k must be less than the number of items in population.")
    if mode > psize:
        raise ValueError("mode must be less than the number of items in population.")
    indices_chosen = []
    sample = []
    for i in range(k):
        # This ensures unique selections
        while True:
            choice = math.floor(random.triangular(0, psize, mode))
            if choice not in indices_chosen:
                break
        indices_chosen.append(choice)
        sample.append(population[choice])
    return sample

...

我怀疑这不是防止重复物品被拉出的理想方法。我在设计时首先想到的是，在对population和.pop()项目进行重复采样以防止选择同一项目两次，但我看到了两个问题：

如果population是对象列表，则在确保sample中的项目指向population中的相同对象时，重复列表可能会有一些困难。
在人口中使用.pop()会改变人口规模，每次都会改变分布。理想情况下，分配（不确定我是否正确使用该术语 - 每个项目被调用的概率）无论选择哪个项目顺序都是相同的。

是否有更有效的方法从群体中采集非均匀随机样本？

Answer 1

您可以使用numpy.random.choice

实现所需目标

此功能的输入如下：

numpy.random.choice(a, size=None, replace=True, p=None)

因此您可以将权重向量p指定为所需的概率分布，并选择replace=False，以便不会重复采样。

或者，您可以使用numpy.random.triangular直接从三角形分布中进行采样。您可以在循环中执行此操作，并仅在以前没有出现的情况下将新结果添加到列表中。

模仿random.sample（）用于非均匀分布

1 个答案: