使用numpy切片数组索引numpy数组

时间:2017-02-06 22:15:22

标签: python arrays numpy slice

(编辑:我根据hpaulj的答案编写了一个解决方案,请参阅本文底部的代码)

我写了一个函数,将一个n维数组细分为较小的数组,这样每个细分都有max_chunk_size个元素。

由于我需要细分许多相同形状的数组,然后对相应的块执行操作,它实际上并不对数据进行操作,而是创建一个“索引器”数组,即。即一组(slice(x1, x2), slice(y1, y2), ...)个对象(参见下面的代码)。使用这些索引器,我可以通过调用the_array[indexer[i]]来检索细分(参见下面的示例)。

此外,这些索引器的数组具有与输入相同的维数,并且分区沿着相应的轴对齐,即。即块the_array[indexer[i,j,k]]the_array[indexer[i+1,j,k]]沿着0轴等辅助

我期待我也应该能够通过调用the_array[indexer[i:i+2,j,k]]来连接这些块,而the_array[indexer]只返回the_array,但是这样的调用会导致错误:

  

IndexError:用作索引的数组必须是整数(或布尔值)   型

这个错误有一个简单的方法吗?

以下是代码:

import numpy as np
import itertools

def subdivide(shape, max_chunk_size=500000):
    shape = np.array(shape).astype(float)
    total_size = shape.prod()

    # calculate maximum slice shape:
    slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)

    # create a list of slices for each dimension:
    slices = [[slice(left, min(right, n)) \
      for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
        for n, step_size in zip(shape.astype(int), slice_shape)]

    result = np.empty(reduce(lambda a,b:a*len(b), slices, 1), dtype=np.object)
    for i, el in enumerate(itertools.product(*slices)): result[i] = el
    result.shape = np.ceil(shape / slice_shape).astype(int)
    return result

以下是一个示例用法:

>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])

>>> slices = subdivide(ar.shape, 16)
>>> slices
array([[(slice(0, 2, None), slice(0, 6, None)),
        (slice(0, 2, None), slice(6, 12, None)),
        (slice(0, 2, None), slice(12, 15, None))],
       [(slice(2, 4, None), slice(0, 6, None)),
        (slice(2, 4, None), slice(6, 12, None)),
        (slice(2, 4, None), slice(12, 15, None))],
       [(slice(4, 6, None), slice(0, 6, None)),
        (slice(4, 6, None), slice(6, 12, None)),
        (slice(4, 6, None), slice(12, 15, None))]], dtype=object)

>>> ar[slices[1,0]]
array([[30, 31, 32, 33, 34, 35],
       [45, 46, 47, 48, 49, 50]])
>>> ar[slices[0,2]]
array([[12, 13, 14],
       [27, 28, 29]])
>>> ar[slices[2,1]]
array([[66, 67, 68, 69, 70, 71],
       [81, 82, 83, 84, 85, 86]])

>>> ar[slices[:2,1:3]]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: arrays used as indices must be of integer (or boolean) type

这是基于hpaulj答案的解决方案:

import numpy as np
import itertools

class Subdivision():
    def __init__(self, shape, max_chunk_size=500000):
        shape = np.array(shape).astype(float)
        total_size = shape.prod()

        # calculate maximum slice shape:
        slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)

        # create a list of slices for each dimension:
        slices = [[slice(left, min(right, n)) \
          for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
            for n, step_size in zip(shape.astype(int), slice_shape)]

        self.slices = \
            np.array(list(itertools.product(*slices)), \
                     dtype=np.object).reshape(tuple(np.ceil(shape / slice_shape).astype(int)) + (len(shape),))

    def __getitem__(self, args):
        if type(args) != tuple: args = (args,)

        # turn integer index into equivalent slice
        args = tuple(slice(arg, arg + 1 if arg != -1 else None) if type(arg) == int else arg for arg in args)

        # select the slices
        # always select all elements from the last axis (which contains slices for each data dimension)
        slices = self.slices[args + ((slice(None),) if Ellipsis in args else (Ellipsis, slice(None)))]

        return np.ix_(*tuple(np.r_[tuple(slices[tuple([0] * i + [slice(None)] + \
                                                      [0] * (len(slices.shape) - 2 - i) + [i])])] \
                                for i in range(len(slices.shape) - 1)))

使用示例:

>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])

>>> subdiv = Subdivision(ar.shape, 16)
>>> ar[subdiv[...]]
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])

>>> ar[subdiv[0]]
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

>>> ar[subdiv[:2,1]]
array([[ 6,  7,  8,  9, 10, 11],
       [21, 22, 23, 24, 25, 26],
       [36, 37, 38, 39, 40, 41],
       [51, 52, 53, 54, 55, 56]])

>>> ar[subdiv[2,:3]]
array([[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])

>>> ar[subdiv[...,:2]]
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41],
       [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]])

1 个答案:

答案 0 :(得分:3)

你的切片产生2x6和2x3阵列。

In [36]: subslice=slices[:2,1:3]
In [37]: subslice[0,0]
Out[37]: array([slice(0, 2, None), slice(6, 12, None)], dtype=object)

In [38]: ar[tuple(subslice[0,0])]
Out[38]: 
array([[ 6,  7,  8,  9, 10, 11],
       [21, 22, 23, 24, 25, 26]])

我的numpy版本希望我将subslice变成一个元组。这与

相同
ar[slice(0,2), slice(6,12)]
ar[:2, 6:12]

这只是索引和切片的基本语法。 ar是2d,因此ar[(i,j)]需要2个元素元组 - 切片,列表,数组或整数。它不会使用切片对象数组。

如何将结果连接成更大的数组。这可以在索引之后完成,也可以将切片转换为索引列表。

例如,

np.bmat将数组的2d排列连接在一起:

In [42]: np.bmat([[ar[tuple(subslice[0,0])], ar[tuple(subslice[0,1])]], 
                  [ar[tuple(subslice[1,0])],ar[tuple(subslice[1,1])]]])
Out[42]: 
matrix([[ 6,  7,  8,  9, 10, 11, 12, 13, 14],
        [21, 22, 23, 24, 25, 26, 27, 28, 29],
        [36, 37, 38, 39, 40, 41, 42, 43, 44],
        [51, 52, 53, 54, 55, 56, 57, 58, 59]])

你可以概括一下。它只在嵌套列表上使用hstackvstack。结果为np.matrix,但可以转换回array

另一种方法是使用np.arangenp.r_np.xi_等工具来创建索引数组。它会玩一些游戏来生成一个例子。

组合[0,0]和[0,1]子句:

In [64]: j = np.r_[subslice[0,0,1],subslice[0,1,1]]
In [65]: i = np.r_[subslice[0,0,0]]

In [66]: i,j
Out[66]: (array([0, 1]), array([ 6,  7,  8,  9, 10, 11, 12, 13, 14]))
In [68]: ix = np.ix_(i,j)
In [69]: ix
Out[69]: 
(array([[0],
        [1]]), array([[ 6,  7,  8,  9, 10, 11, 12, 13, 14]]))

In [70]: ar[ix]
Out[70]: 
array([[ 6,  7,  8,  9, 10, 11, 12, 13, 14],
       [21, 22, 23, 24, 25, 26, 27, 28, 29]])

或者i = np.r_[subslice[0,0,0], subslice[1,0,0]]ar[np.ix_(i,j)]生成4x9数组。