Question

这是我想要的功能

random_select(contain_list, ttl_num, sample_num)

从ttl_num到0有ttl_num-1个整数可供选择，我想返回sample_num个唯一整数的列表，其中{{1}中提供了数字必须在列表中，并随机选择其他数字。

我必须经常执行此查询，每次使用不同的contain_list，但contain_list，ttl_num对于所有查询都相同。

目前我正在做的是，首先生成一组sample_num整数，从集合中减去ttl_num，随机选择一些没有替换的数字，然后将其连接到{ {1}}获得结果。

我相信这不是最快的方式，更好的想法？

如果需要，可以使用全局变量。

编辑：
contain_list长度不小于contain_list，我想获得sample_num加contain_list个其他随机数字保证contain_list中的数字都在sample_num - contain_list.length到contain_list的范围内。

Answer 1

这里有几种可能性。两者都没有你现有的那么复杂，但根据参数值的大小，它们中的一个或两个可能变得更快。只有与实际数据进行基准测试才能确定。

方法1

这里的逻辑基本上和你正在做的一样。它只是用整数数组替换集合生成和操作，整数数组应该更轻。但是，它确实需要对contain_list进行排序（降序），因此它是否实际上比您已经拥有的更快，可能取决于contain_list.count和ttl_num的大小。< / p>

1) initialize a tracking var, remaining_num = ttl_num

2) initialize an integer array with value = index

3) sort contain_list descending

4) iterate through contain_list (now in descending order); for each:
4.1) decrement remaining_num
4.2) swap the element at the selected index with the one at index = remaining_num

5) iterate (sample_num - contain_list.count) times; for each:
5.1) generate a random index between 0 and remaining_num (inclusive and exclusive, respectively)
5.2) decrement remaining_num
5.3) swap the element at the selected index with the one at index = remaining_num

6) The resultant samples will start at index reamining_num and run through the end of the array.

以下是random_select（{3,7}，10,5）...

的示例运行

remaining_num = 10

available_num[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

contain_list = {7, 3}

select the 7
remaining_num = 9
available_num[] = {0, 1, 2, 3, 4, 5, 6, 9, 8, 7}

select the 3
remaining_num = 8
available_num[] = {0, 1, 2, 8, 4, 5, 6, 9, 3, 7}

select a random(0,8), e.g. 2
remaining_num = 7
available_num[] = {0, 1, 9, 8, 4, 5, 6, 2, 3, 7}

select a random(0,7), e.g. 3
remaining_num = 6
available_num[] = {0, 1, 9, 6, 4, 5, 8, 2, 3, 7}

select a random(0,6), e.g. 0
remaining_num = 5
available_num[] = {5, 1, 9, 6, 4, 0, 8, 2, 3, 7}

result = {0, 8, 2, 3, 7}

方法2

如果ttl_num足够大且sample_num足够低，则可能值得将事情颠倒过来。也就是说，不是创建和操作一组可用数字，而是仅跟踪所选数字的列表。然后，当选择每个随机目标时，通过迭代所选数字列表并计算如何小于或等于目标索引来“跳过”先前选择的数字。

1) initialize a tracking var, remaining_num = ttl_num - contain_list.count

2) declare an empty list (vector) of integers, selected_num[]

4) iterate through contain_list; for each:
4.1) insert cointain_list[i] into selected_num[]

5) iterate (sample_num - contain_list.count) times; for each:
5.1) generate a random target between 0 and remaining_num (inclusive and exclusive, respectively)
5.2) decrement remaining_num
5.3) iterate through selected_num; for each:
5.3.1) if target >= selected_list[j], increment target
5.4) insert target into selected_num[]

6) The resultant samples will be all elements in selected_num.

以下是random_select（{3,7}，10,5）...

的示例运行

remaining_num = 8

selected_num[] = {}

select the 3
selected_num[] = {3}

select the 7
selected_num[] = {3, 7}

select a random(0,8), e.g. target = 2
remaining_num = 7
2 < 3; target still 2
2 < 7; target still 2
selected_num[] = {3, 7, 2}

select a random(0,7), e.g. target = 3
remaining_num = 6
3 >= 3; target becomes 4
4 < 7; target still 4
4 >= 2; target becomes 5
selected_num[] = {3, 7, 2, 5}

select a random(0,6), e.g. target = 0
remaining_num = 5
0 < 3; target still 0
0 < 7; target still 0
0 < 2; target still 0
0 < 5; target still 0
selected_num[] = {3, 7, 2, 5, 0}

显然，如果selected_num[]很大，那么在选择每个新号码时迭代sample_num可能会变得昂贵。这可以通过以降序排序顺序维持selected_num[]并在看到小于目标的数字时打破内循环来稍微缓解。在列表中的该点插入目标以保持排序。

Answer 2

我刚刚使用numpy以矢量化的方式编写了一些与James Droscha的答案类似的方法1的代码，结果只是几行代码，

def random_select(batch, ttl_num, sample_num):
    # add the following line if elements in batch are not guaranteed to be unique
    # batch = np.unique(batch)
    batch_size = len(batch)
    # step 1
    candidates = np.arange(ttl_num)
    # step 4
    candidates[batch] = candidates[-batch_size:]  # so that elements in candidates[:ttl_num-batch_size] are not contained in batch
    # step 5
    idx = np.random.choice(ttl_num-batch_size, sample_num-batch_size, replace=False)
    return np.concatenate([candidates[idx], batch])

在包含给定数字集的范围内随机选择某些数字的最快方法是什么？

2 个答案:

方法1

方法2