Question

我有一个长而合理稀疏的布尔矢量，我想迭代地从中选择随机元素，我想知道最有效的方法是什么。

矢量最长可达100,000个元素，每20个元素中约有1个元素在任何时候都是“真”。

选择其中一个元素，偶尔会导致其他元素可供选择;所以我不能只做一个布尔向量的初始传递来获取所有可用元素的索引，然后将该向量和弹出元素混洗，因为可用元素列表会发生变化。

我已经提出了一些想法，但无法确定哪种想法最好。因此，我们将非常感谢任何见解。

方法1：

given input boolean vector A
create boolean vector B    // to store previously selected elements
create int vector C        // to store currently available element indices 
while stopping condition not met:
    for each element a in A:
        if a is "true":
            append index of a to C
    generate random integer i between 0 and length of A
    set i-th element of C in A to "false"
    set i-th element of C in B to "true"
    compute any new "true" values of A

方法2：

given input boolean vector A
create boolean vector B    // to store previously selected elements
create int vector C        // to store currently available element indices 
for each element a in A:
    if a is "true":
        append index of a to C
shuffle C
while stopping condition not met:
    pop element from back of C
    set i-th element of C in A to "false"
    set i-th element of C in B to "true"
    compute any new "true" values of A
    if new values in A computed:
        append index of new available element to C 
        shuffle C

因为并非A中的每个选择都会导致对可用元素集的更改，我认为方法2可能会优于1，除了我不确定拖拽长向量会导致多少努力。

方法3：

given input boolean vector A
create boolean vector B    // to store previously selected elements
while stopping condition not met:
    generate random integer i between 0 and length of A
    If i is "true" in A:
        set i in A to "false"
        set i in B to "true"
        compute any new "true" values of A

这最后的方式看起来有点幼稚和简单，但我想如果每20个元素中大约有1个为真（除了最后一组元素，当不能为所选元素添加时），那么平均而言，它只需要大约20次尝试才能找到一个可选择的元素，这实际上可能比完全传递输入向量要小，或者改组可用索引的向量（特别是如果有问题的向量是相当长）。找到最后几个会非常困难，但我可以跟踪已选择的数量，一旦剩余金额低于某个水平，我可以改变最终批次的选择方式。

有没有人知道哪个更有效率？如果这有任何区别，那么实现将使用C ++。

感谢您的帮助

Answer 1

您可以将稀疏矢量的表示更改为以下内容 -

主要向量（您现在拥有的向量）
真矢量（所有“真实”指数列表）

您的操作现在变为 -

Insert:   
    check if i in Primary Vector
    if false, set to true and add to True Vector

Delete:
    check if i in Primary Vector
    if true, set to false and remove from True Vector by swapping
    with last element and reducing size

（为此需要从Primary Vector到True Vector的指针）。

Random:
    Generate random index j from size of (True Vector)
    return True Vector[j]

您的所有操作都可以O(1)复杂度完成。

Answer 2

这听起来像Van Emde Boas tree

的情况

Space   O(M)
Search  O(log log M)
Insert  O(log log M)
Delete  O(log log M)

使用成员数注释aux数组，以便更容易找到随机元素。

从长（合理）稀疏向量中选择随机元素的最有效方法是什么？

2 个答案: