我搜索最快的方法从列表中绘制了两个值,因此这两个值在这对夫妇中总是不同的。我编码的天真方法非常慢,所以我很确定它们是更有效的方法。
import numpy
listx = range(10)
number_of_couples=10000
data=numpy.empty([number_of_couples,2])
for i in xrange(number_of_couples):
data[i] =numpy.random.choice(listx,size=2,replace=False)
答案 0 :(得分:3)
我的建议是使用itertools.permutations
缓存列表中的所有组合,然后使用random.choice
从中绘制对:
import itertools
import random
import numpy
listx = range(10)
number_of_couples = 10000
permutations = list(itertools.permutations(listx, 2))
data = numpy.array([random.choice(permutations) for _ in range(number_of_couples)])
通过使用IPython测试您的解决方案,平均需要489毫秒:
489 ms ± 13.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
我的建议速度提高了25倍以上:
17.2 ms ± 225 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
请注意,对于permutations
,(x, y)
与(y, x)
不同:
>>> print(list(itertools.permutations(range(3), 2)))
[(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)]
如果您的情况(x, y)
和(y, x)
的信息相同,请使用itertools.combinations
:
>>> print(list(itertools.combinations(range(3), 2)))
[(0, 1), (0, 2), (1, 2)]
还有一件事:您的解决方案的主要问题是data[i] =
操作,这非常昂贵。只需从list comprehension创建一个新数组,就可以大大加快解决方案的速度,而不是创建一个空数组并使用所选择的数组修改它:
import numpy
listx = range(10)
number_of_couples=10000
data = numpy.array([numpy.random.choice(listx,size=2,replace=False) for i in range(number_of_couples)]) # Create a new array from the choices
看看它有多快:
24.3 ns ± 0.554 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
答案 1 :(得分:1)
对于大多数应用来说,这应该足够快,请参阅以下时间:
# make an example
>>> import string
>>> listx = list(string.ascii_letters)
>>>
>>>
>>> L = len(listx)
>>> number_of_couples = 10_000_000
>>>
>>> idx = np.array((np.random.randint(0, L, (number_of_couples,)), np.random.randint(0, L-1, (number_of_couples,))))
>>> idx[1, idx[0] == idx[1]] = L-1
>>>
>>> result = np.array(listx)[idx.T]
>>>
>>> result
array([['p', 't'],
['O', 'F'],
['M', 'S'],
...,
['Q', 'k'],
['N', 'm'],
['f', 'x']], dtype='<U1')
>>>
# sanity checks
# distribution looks flat
>>> np.bincount(idx.ravel())
array([384357, 385561, 384175, 384998, 385799, 384446, 384356, 384684,
384305, 384072, 384993, 384346, 385302, 384518, 384659, 384142,
383554, 384790, 384424, 384032, 383950, 385103, 384092, 384653,
383428, 385388, 384074, 384197, 384644, 384741, 384343, 384282,
384192, 385791, 384106, 383872, 384506, 385161, 384401, 384661,
383978, 385547, 385571, 385941, 385416, 385325, 383997, 385201,
383998, 384199, 385105, 384624])
# pairs are distinct
>>> np.any(idx[0] == idx[1])
False
计时与itertools
解决方案比较:
>>> def f_np(listx, number_of_couples):
... L = len(listx)
... idx = np.array((np.random.randint(0, L, (number_of_couples,)), np.random.randint(0, L-1, (number_of_couples,))))
... idx[1, idx[0] == idx[1]] = L-1
... return np.array(listx)[idx.T]
...
>>> def f_it(listx, number_of_couples):
... permutations = list(itertools.permutations(listx, 2))
... return numpy.array([random.choice(permutations) for _ in range(number_of_couples)])
...
>>> from time import perf_counter
>>> t = perf_counter(); f_it(listx, number_of_couples); s = perf_counter()
array([['s', 'm'],
['G', 'w'],
['w', 'S'],
...,
['V', 'R'],
['P', 'Q'],
['Q', 'J']], dtype='<U1')
>>> s-t
10.544860829017125
>>> t = perf_counter(); f_np(listx, number_of_couples); s = perf_counter()
array([['C', 'T'],
['X', 'y'],
['U', 's'],
...,
['U', 'M'],
['t', 'i'],
['m', 'c']], dtype='<U1')
>>> s-t
0.3759624689701013
答案 2 :(得分:0)
您可以使用itertools中的方法combinations。
与Matheus Portela的答案不同的是,组合以字典排序顺序排列,因此['A','B']将产生['AB'],其中perumation将产生['AB','BA']