Question

我搜索最快的方法从列表中绘制了两个值，因此这两个值在这对夫妇中总是不同的。我编码的天真方法非常慢，所以我很确定它们是更有效的方法。

import numpy
listx = range(10)
number_of_couples=10000
data=numpy.empty([number_of_couples,2])

for i in xrange(number_of_couples):
   data[i] =numpy.random.choice(listx,size=2,replace=False)

Answer 1

我的建议是使用itertools.permutations缓存列表中的所有组合，然后使用random.choice从中绘制对：

import itertools
import random
import numpy

listx = range(10)
number_of_couples = 10000
permutations = list(itertools.permutations(listx, 2))
data = numpy.array([random.choice(permutations) for _ in range(number_of_couples)])

通过使用IPython测试您的解决方案，平均需要489毫秒：

489 ms ± 13.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

我的建议速度提高了25倍以上：

17.2 ms ± 225 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

请注意，对于permutations，(x, y)与(y, x)不同：

>>> print(list(itertools.permutations(range(3), 2)))
[(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)]

如果您的情况(x, y)和(y, x)的信息相同，请使用itertools.combinations：

>>> print(list(itertools.combinations(range(3), 2)))
[(0, 1), (0, 2), (1, 2)]

还有一件事：您的解决方案的主要问题是data[i] =操作，这非常昂贵。只需从list comprehension创建一个新数组，就可以大大加快解决方案的速度，而不是创建一个空数组并使用所选择的数组修改它：

import numpy
listx = range(10)
number_of_couples=10000
data = numpy.array([numpy.random.choice(listx,size=2,replace=False) for i in range(number_of_couples)]) # Create a new array from the choices

看看它有多快：

24.3 ns ± 0.554 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Answer 2

对于大多数应用来说，这应该足够快，请参阅以下时间：

# make an example
>>> import string
>>> listx = list(string.ascii_letters)
>>> 
>>> 
>>> L = len(listx)
>>> number_of_couples = 10_000_000
>>> 
>>> idx = np.array((np.random.randint(0, L, (number_of_couples,)), np.random.randint(0, L-1, (number_of_couples,))))
>>> idx[1, idx[0] == idx[1]] = L-1
>>> 
>>> result = np.array(listx)[idx.T]
>>> 
>>> result
array([['p', 't'],
       ['O', 'F'],
       ['M', 'S'],
       ...,
       ['Q', 'k'],
       ['N', 'm'],
       ['f', 'x']], dtype='<U1')
>>> 
# sanity checks
# distribution looks flat
>>> np.bincount(idx.ravel())
array([384357, 385561, 384175, 384998, 385799, 384446, 384356, 384684,
       384305, 384072, 384993, 384346, 385302, 384518, 384659, 384142,
       383554, 384790, 384424, 384032, 383950, 385103, 384092, 384653,
       383428, 385388, 384074, 384197, 384644, 384741, 384343, 384282,
       384192, 385791, 384106, 383872, 384506, 385161, 384401, 384661,
       383978, 385547, 385571, 385941, 385416, 385325, 383997, 385201,
       383998, 384199, 385105, 384624])
# pairs are distinct
>>> np.any(idx[0] == idx[1])
False

计时与itertools解决方案比较：

>>> def f_np(listx, number_of_couples):
...     L = len(listx)
...     idx = np.array((np.random.randint(0, L, (number_of_couples,)), np.random.randint(0, L-1, (number_of_couples,))))
...     idx[1, idx[0] == idx[1]] = L-1
...     return np.array(listx)[idx.T]
... 
>>> def f_it(listx, number_of_couples):
...     permutations = list(itertools.permutations(listx, 2))
...     return numpy.array([random.choice(permutations) for _ in range(number_of_couples)])
... 
>>> from time import perf_counter
>>> t = perf_counter(); f_it(listx, number_of_couples); s = perf_counter()
array([['s', 'm'],
       ['G', 'w'],
       ['w', 'S'],
       ...,
       ['V', 'R'],
       ['P', 'Q'],
       ['Q', 'J']], dtype='<U1')
>>> s-t
10.544860829017125
>>> t = perf_counter(); f_np(listx, number_of_couples); s = perf_counter()
array([['C', 'T'],
       ['X', 'y'],
       ['U', 's'],
       ...,
       ['U', 'M'],
       ['t', 'i'],
       ['m', 'c']], dtype='<U1')
>>> s-t
0.3759624689701013

Answer 3

您可以使用itertools中的方法combinations。

与Matheus Portela的答案不同的是，组合以字典排序顺序排列，因此['A'，'B']将产生['AB']，其中perumation将产生['AB'，'BA']

从列表中选择几个值而不替换

3 个答案: