Python中通过生成器的字符组合

时间:2016-10-15 12:24:51

标签: python python-2.7 numpy scikit-learn itertools

我试图获得长度为(8)的字符串替换(换句话说,笛卡尔积)的所有组合。输出太大,我无法一次性将它存储在内存中,所以我的进程在它完成之前就被杀死了。

这是我使用的代码,它比Python的stdlib itertools更快:

import numpy as np

def cartesian(arrays, out=None):
"""Generate a cartesian product of input arrays.
Parameters
----------
arrays : list of array-like
    1-D arrays to form the cartesian product of.
out : ndarray
    Array to place the cartesian product in.
Returns
-------
out : ndarray
    2-D array of shape (M, len(arrays)) containing cartesian products
    formed of input arrays.
Examples
--------
>>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
array([[1, 4, 6],
       [1, 4, 7],
       [1, 5, 6],
       [1, 5, 7],
       [2, 4, 6],
       [2, 4, 7],
       [2, 5, 6],
       [2, 5, 7],
       [3, 4, 6],
       [3, 4, 7],
       [3, 5, 6],
       [3, 5, 7]])
"""
arrays = [np.asarray(x) for x in arrays]
shape = (len(x) for x in arrays)
dtype = arrays[0].dtype

ix = np.indices(shape)
ix = ix.reshape(len(arrays), -1).T

if out is None:
    out = np.empty_like(ix, dtype=dtype)

for n, arr in enumerate(arrays):
    out[:, n] = arrays[n][ix[:, n]]

return out

如何让它从结果中返回一个生成器,而不是一次性将所有内容存储到内存中?

1 个答案:

答案 0 :(得分:1)

我对其他问题的印象是product是迭代生成笛卡尔组合的最快方法:

In [494]: g=itertools.product([1,2,3],[4,5],[6,7])
In [495]: list(g)
Out[495]: 
[(1, 4, 6),
 (1, 4, 7),
 (1, 5, 6),
 (1, 5, 7),
 (2, 4, 6),
 (2, 4, 7),
 (2, 5, 6),
 (2, 5, 7),
 (3, 4, 6),
 (3, 4, 7),
 (3, 5, 6),
 (3, 5, 7)]

您的代码是np.indices的映射,速度较慢:

In [499]: timeit np.indices((3,2,2)).reshape(3,-1).T
The slowest run took 11.08 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 61.6 µs per loop
In [500]: timeit list(itertools.product([1,2,3],[4,5],[6,7]))
100000 loops, best of 3: 3.51 µs per loop