Question

我想从numpy的结果中构建一个itertools.product数组。我的第一种方法很简单：

from itertools import product
import numpy as np

max_init = 6
init_values = range(1, max_init + 1)
repetitions = 12

result = np.array(list(product(init_values, repeat=repetitions)))

此代码适用于＆＃34; small＆＃34; repetitions（例如＆lt; = 4），但是＆＃34;大＆＃34;值（＆gt; = 12）它完全占用内存并崩溃。我假设构建列表是吃掉所有RAM的东西，所以我搜索了如何直接使用数组。我找到了Numpy equivalent of itertools.product和Using numpy to build an array of all combinations of two arrays。

所以，我测试了以下替代方案：

备选方案＃1：

results = np.empty((max_init**repetitions, repetitions))
for i, row in enumerate(product(init_values, repeat=repetitions)):
    result[:][i] = row

备选方案＃2：

init_values_args = [init_values] * repetitions
results = np.array(np.meshgrid(*init_values_args)).T.reshape(-1, repetitions)

备选方案＃3：

results = np.indices([sides] * num_dice).reshape(num_dice, -1).T + 1

＃1 非常慢。我没有足够的耐心让它完成（在2017年MacBook Pro上处理几分钟后）。＃2 和＃3 会占用所有内存，直到python解释器崩溃，就像初始方法一样。

之后，我认为我可以用对我来说仍然有用的不同方式表达相同的信息：dict其中键将是所有可能的（已排序的）组合，并且值将是计算这些组合。所以我试过了：

备选方案＃4：

from collections import Counter

def sorted_product(iterable, repeat=1):
    for el in product(iterable, repeat=repeat):
        yield tuple(sorted(el))

def count_product(repetitions=1, max_init=6):
    init_values = range(1, max_init + 1)
    sp = sorted_product(init_values, repeat=repetitions)
    counted_sp = Counter(sp)
    return np.array(list(counted_sp.values())), \
        np.array(list(counted_sp.keys()))

cnt, values = count(repetitions=repetitions, max_init=max_init)

但是，触发获取生成器所有值的行counted_sp = Counter(sp)对于我的需求来说也太慢了（在取消它之前还花了几分钟）。

是否有另一种方法来生成相同的数据（或包含相同信息的不同数据结构），这些数据没有上述过慢或使用过多内存的缺点？

PS：我用一个小的repetitions测试了上面的所有实现，并且测试了所有测试，因此它们给出了一致的结果。

我希望编辑问题是扩展它的最佳方式。否则，请告诉我，我会在应该的位置编辑帖子。

在阅读下面的前两个答案并思考之后，我同意我从错误的角度处理这个问题。而不是与蛮力＆＃34;方法我应该使用概率并使用它。

我的意图是，稍后，每个组合： - 计算阈值X以下的值。 - 计算有多少值等于或超过阈值X且低于阈值Y. - 计算超过阈值Y的值。并对具有相同计数的组合进行分组。

作为一个说明性的例子：如果我滚动12个骰子的6个边，那么M骰子的值为＆lt; 3，N骰子的值为＆gt; = 3和＆lt; 4的概率是什么，P骰子的值> 5 ，对于M，N和P的所有可能组合？

所以，我认为在我采用这种新方法的时候，我会在几天内结束这个问题。感谢您的所有反馈和时间！

Answer 1

list(product(range(1,7), repeats=12))生成的数字元组是6 ** 12,1,176,782,336。列表或数组是否对大多数计算机而言可能过大。

In [119]: len(list(product(range(1,7),repeat=12)))
....
MemoryError:

尝试直接制作该大小的数组：

In [129]: A = np.ones((6**12,12),int)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-129-e833a9e859e0> in <module>()
----> 1 A = np.ones((6**12,12),int)

/usr/local/lib/python3.5/dist-packages/numpy/core/numeric.py in ones(shape, dtype, order)
    190 
    191     """
--> 192     a = empty(shape, dtype, order)
    193     multiarray.copyto(a, 1, casting='unsafe')
    194     return a

ValueError: Maximum allowed dimension exceeded

数组内存大小，每个项目4个字节

In [130]: 4*12*6**12
Out[130]: 104,485,552,128

100GB？

为什么需要生成7个数字的2B组合？

因此，使用您的计数器可以减少项目数量

In [138]: sp = sorted_product(range(1,7), 2)
In [139]: counter=Counter(sp)
In [140]: counter
Out[140]: 
Counter({(1, 1): 1,
         (1, 2): 2,
         (1, 3): 2,
         (1, 4): 2,
         (1, 5): 2,
         (1, 6): 2,
         (2, 2): 1,
         (2, 3): 2,
         (2, 4): 2,
         (2, 5): 2,
         (2, 6): 2,
         (3, 3): 1,
         (3, 4): 2,
         (3, 5): 2,
         (3, 6): 2,
         (4, 4): 1,
         (4, 5): 2,
         (4, 6): 2,
         (5, 5): 1,
         (5, 6): 2,
         (6, 6): 1})

从36到21（重复2次）。不应该将它概括为更多的重复（组合？排列？），它仍然会推动时间和/或内存边界。

使用meshgrid的{{1}}上的变体：

mgrid

In [175]: n=7; A=np.mgrid[[slice(1,7)]*n].reshape(n,-1).T In [176]: A.shape Out[176]: (279936, 7) In [177]: B=np.array(list(product(range(1,7),repeat=7))) In [178]: B.shape Out[178]: (279936, 7) In [179]: A[:10] Out[179]: array([[1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 2], [1, 1, 1, 1, 1, 1, 3], [1, 1, 1, 1, 1, 1, 4], [1, 1, 1, 1, 1, 1, 5], [1, 1, 1, 1, 1, 1, 6], [1, 1, 1, 1, 1, 2, 1], [1, 1, 1, 1, 1, 2, 2], [1, 1, 1, 1, 1, 2, 3], [1, 1, 1, 1, 1, 2, 4]]) In [180]: B[:10] Out[180]: array([[1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 2], [1, 1, 1, 1, 1, 1, 3], [1, 1, 1, 1, 1, 1, 4], [1, 1, 1, 1, 1, 1, 5], [1, 1, 1, 1, 1, 1, 6], [1, 1, 1, 1, 1, 2, 1], [1, 1, 1, 1, 1, 2, 2], [1, 1, 1, 1, 1, 2, 3], [1, 1, 1, 1, 1, 2, 4]]) In [181]: np.allclose(A,B)速度要快得多：

mgrid

但是，是的，它将具有相同的总内存使用量和限制。

n = 10，

In [182]: timeit B=np.array(list(product(range(1,7),repeat=7)))
317 ms ± 3.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [183]: timeit A=np.mgrid[[slice(1,7)]*n].reshape(n,-1).T
13.9 ms ± 242 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Answer 2

正确的答案是：不要。无论您想对所有这些组合做什么，都要调整您的方法，以便您一次生成一个并立即使用它们而不存储它们，或者更好的是，找到一种方法来完成工作而不检查每个组合。您当前的解决方案适用于玩具问题，但不适用于较大的参数。解释你在做什么，也许这里有人可以提供帮助。

从itertools.product

2 个答案: