我希望生成相对大量阵列的笛卡尔积,以跨越高维网格。由于高维度,不可能将笛卡尔积计算的结果存储在存储器中;而是它将被写入硬盘。由于这种约束,我需要在生成中间结果时访问它们。我到目前为止所做的是:
for x in xrange(0, 10):
for y in xrange(0, 10):
for z in xrange(0, 10):
writeToHdd(x,y,z)
除了非常讨厌之外,它不具有可扩展性(即它需要我编写与维度一样多的循环)。我试图使用提出的解决方案here,但这是一个递归解决方案,因此很难在生成时动态获取结果。有没有'整洁'除了每个维度都有一个硬编码循环之外,还有其他方法吗?
答案 0 :(得分:4)
在普通Python中,您可以使用itertools.product
生成一组可迭代的笛卡尔积。
>>> arrays = range(0, 2), range(4, 6), range(8, 10)
>>> list(itertools.product(*arrays))
[(0, 4, 8), (0, 4, 9), (0, 5, 8), (0, 5, 9), (1, 4, 8), (1, 4, 9), (1, 5, 8), (1, 5, 9)]
在Numpy中,您可以numpy.meshgrid
组合numpy.ndindex
(传递sparse=True
以避免在内存中扩展产品):
>>> arrays = np.arange(0, 2), np.arange(4, 6), np.arange(8, 10)
>>> grid = np.meshgrid(*arrays, sparse=True)
>>> [tuple(g[i] for g in grid) for i in np.ndindex(grid[0].shape)]
[(0, 4, 8), (0, 4, 9), (1, 4, 8), (1, 4, 9), (0, 5, 8), (0, 5, 9), (1, 5, 8), (1, 5, 9)]
答案 1 :(得分:1)
我想我找到了一个使用内存映射文件的好方法:
def carthesian_product_mmap(vectors, filename, mode='w+'):
'''
Vectors should be a tuple of `numpy.ndarray` vectors. You could
also make it more flexible, and include some error checking
'''
# Make a meshgrid with `copy=False` to create views
grids = np.meshgrid(*vectors, copy=False, indexing='ij')
# The shape for concatenating the grids from meshgrid
shape = grid[0].shape + (len(vectors),)
# Find the "highest" dtype neccesary
dtype = np.result_type(*vectors)
# Instantiate the memory mapped file
M = np.memmap(filename, dtype, mode, shape=shape)
# Fill the memmap with the grids
for i, grid in enumerate(grids):
M[...,i] = grid
# Make sure the data is written to disk (optional?)
M.flush()
# Reshape to put it in the right format for Carthesian product
return M.reshape((-1, len(vectors)))
但我想知道你是否真的需要存储整个Carthesian产品(有很多数据重复)。是否不是在需要时在产品中生成行的选项?
答案 2 :(得分:0)
似乎你只想循环任意数量的维度。我的通用解决方案是使用索引字段和增量索引以及处理溢出。
示例:
n = 3 # number of dimensions
N = 1 # highest index value per dimension
idx = [0]*n
while True:
print(idx)
# increase first dimension
idx[0] += 1
# handle overflows
for i in range(0, n-1):
if idx[i] > N:
# reset this dimension and increase next higher dimension
idx[i] = 0
idx[i+1] += 1
if idx[-1] > N:
# overflow in the last dimension, we are finished
break
给出:
[0, 0, 0]
[1, 0, 0]
[0, 1, 0]
[1, 1, 0]
[0, 0, 1]
[1, 0, 1]
[0, 1, 1]
[1, 1, 1]
Numpy有类似内置的东西:ndenumerate。