(编辑:我根据hpaulj的答案编写了一个解决方案,请参阅本文底部的代码)
我写了一个函数,将一个n维数组细分为较小的数组,这样每个细分都有max_chunk_size
个元素。
由于我需要细分许多相同形状的数组,然后对相应的块执行操作,它实际上并不对数据进行操作,而是创建一个“索引器”数组,即。即一组(slice(x1, x2), slice(y1, y2), ...)
个对象(参见下面的代码)。使用这些索引器,我可以通过调用the_array[indexer[i]]
来检索细分(参见下面的示例)。
此外,这些索引器的数组具有与输入相同的维数,并且分区沿着相应的轴对齐,即。即块the_array[indexer[i,j,k]]
和the_array[indexer[i+1,j,k]]
沿着0轴等辅助
我期待我也应该能够通过调用the_array[indexer[i:i+2,j,k]]
来连接这些块,而the_array[indexer]
只返回the_array
,但是这样的调用会导致错误:
IndexError:用作索引的数组必须是整数(或布尔值) 型
这个错误有一个简单的方法吗?
以下是代码:
import numpy as np
import itertools
def subdivide(shape, max_chunk_size=500000):
shape = np.array(shape).astype(float)
total_size = shape.prod()
# calculate maximum slice shape:
slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)
# create a list of slices for each dimension:
slices = [[slice(left, min(right, n)) \
for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
for n, step_size in zip(shape.astype(int), slice_shape)]
result = np.empty(reduce(lambda a,b:a*len(b), slices, 1), dtype=np.object)
for i, el in enumerate(itertools.product(*slices)): result[i] = el
result.shape = np.ceil(shape / slice_shape).astype(int)
return result
以下是一个示例用法:
>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> slices = subdivide(ar.shape, 16)
>>> slices
array([[(slice(0, 2, None), slice(0, 6, None)),
(slice(0, 2, None), slice(6, 12, None)),
(slice(0, 2, None), slice(12, 15, None))],
[(slice(2, 4, None), slice(0, 6, None)),
(slice(2, 4, None), slice(6, 12, None)),
(slice(2, 4, None), slice(12, 15, None))],
[(slice(4, 6, None), slice(0, 6, None)),
(slice(4, 6, None), slice(6, 12, None)),
(slice(4, 6, None), slice(12, 15, None))]], dtype=object)
>>> ar[slices[1,0]]
array([[30, 31, 32, 33, 34, 35],
[45, 46, 47, 48, 49, 50]])
>>> ar[slices[0,2]]
array([[12, 13, 14],
[27, 28, 29]])
>>> ar[slices[2,1]]
array([[66, 67, 68, 69, 70, 71],
[81, 82, 83, 84, 85, 86]])
>>> ar[slices[:2,1:3]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: arrays used as indices must be of integer (or boolean) type
这是基于hpaulj答案的解决方案:
import numpy as np
import itertools
class Subdivision():
def __init__(self, shape, max_chunk_size=500000):
shape = np.array(shape).astype(float)
total_size = shape.prod()
# calculate maximum slice shape:
slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)
# create a list of slices for each dimension:
slices = [[slice(left, min(right, n)) \
for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
for n, step_size in zip(shape.astype(int), slice_shape)]
self.slices = \
np.array(list(itertools.product(*slices)), \
dtype=np.object).reshape(tuple(np.ceil(shape / slice_shape).astype(int)) + (len(shape),))
def __getitem__(self, args):
if type(args) != tuple: args = (args,)
# turn integer index into equivalent slice
args = tuple(slice(arg, arg + 1 if arg != -1 else None) if type(arg) == int else arg for arg in args)
# select the slices
# always select all elements from the last axis (which contains slices for each data dimension)
slices = self.slices[args + ((slice(None),) if Ellipsis in args else (Ellipsis, slice(None)))]
return np.ix_(*tuple(np.r_[tuple(slices[tuple([0] * i + [slice(None)] + \
[0] * (len(slices.shape) - 2 - i) + [i])])] \
for i in range(len(slices.shape) - 1)))
使用示例:
>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> subdiv = Subdivision(ar.shape, 16)
>>> ar[subdiv[...]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> ar[subdiv[0]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
>>> ar[subdiv[:2,1]]
array([[ 6, 7, 8, 9, 10, 11],
[21, 22, 23, 24, 25, 26],
[36, 37, 38, 39, 40, 41],
[51, 52, 53, 54, 55, 56]])
>>> ar[subdiv[2,:3]]
array([[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])
>>> ar[subdiv[...,:2]]
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71],
[75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]])
答案 0 :(得分:3)
你的切片产生2x6和2x3阵列。
In [36]: subslice=slices[:2,1:3]
In [37]: subslice[0,0]
Out[37]: array([slice(0, 2, None), slice(6, 12, None)], dtype=object)
In [38]: ar[tuple(subslice[0,0])]
Out[38]:
array([[ 6, 7, 8, 9, 10, 11],
[21, 22, 23, 24, 25, 26]])
我的numpy版本希望我将subslice
变成一个元组。这与
ar[slice(0,2), slice(6,12)]
ar[:2, 6:12]
这只是索引和切片的基本语法。 ar
是2d,因此ar[(i,j)]
需要2个元素元组 - 切片,列表,数组或整数。它不会使用切片对象数组。
如何将结果连接成更大的数组。这可以在索引之后完成,也可以将切片转换为索引列表。
例如, np.bmat
将数组的2d排列连接在一起:
In [42]: np.bmat([[ar[tuple(subslice[0,0])], ar[tuple(subslice[0,1])]],
[ar[tuple(subslice[1,0])],ar[tuple(subslice[1,1])]]])
Out[42]:
matrix([[ 6, 7, 8, 9, 10, 11, 12, 13, 14],
[21, 22, 23, 24, 25, 26, 27, 28, 29],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[51, 52, 53, 54, 55, 56, 57, 58, 59]])
你可以概括一下。它只在嵌套列表上使用hstack
和vstack
。结果为np.matrix
,但可以转换回array
。
另一种方法是使用np.arange
,np.r_
,np.xi_
等工具来创建索引数组。它会玩一些游戏来生成一个例子。
组合[0,0]和[0,1]子句:
In [64]: j = np.r_[subslice[0,0,1],subslice[0,1,1]]
In [65]: i = np.r_[subslice[0,0,0]]
In [66]: i,j
Out[66]: (array([0, 1]), array([ 6, 7, 8, 9, 10, 11, 12, 13, 14]))
In [68]: ix = np.ix_(i,j)
In [69]: ix
Out[69]:
(array([[0],
[1]]), array([[ 6, 7, 8, 9, 10, 11, 12, 13, 14]]))
In [70]: ar[ix]
Out[70]:
array([[ 6, 7, 8, 9, 10, 11, 12, 13, 14],
[21, 22, 23, 24, 25, 26, 27, 28, 29]])
或者i = np.r_[subslice[0,0,0], subslice[1,0,0]]
,ar[np.ix_(i,j)]
生成4x9数组。