遮罩n维numpy数组(以节省内存)

时间:2019-02-07 13:28:38

标签: python arrays numpy

假设其中有一个高维的numpy数组:

import numpy as np
x = np.zeros((200, 200, 200))

其中只有一个连续的*子数组是“有效”条目。其他条目可能会被忽略(在此示例中,每个为1的条目都是有效的,可能会忽略0s)

sub_array = np.s_[100:110, 100:110, 100:110]
x[sub_array] = 1

我如何在Python中表示x,使其与其他numpy数组(切片,索引等)集成,但不浪费所有无效条目的内存?

*如果可能的话,我会对子集也不一定是数组的解决方案感兴趣

1 个答案:

答案 0 :(得分:1)

在几个用例中,您可能会遇到精巧的类,该类实现了__array__方法。这是一个可能的实现方式的草图:

import numpy as np

class PaddedArray:

    def __init__(self, arr, padding):
        self._arr = np.array(arr)
        self._pad = list(tuple(map(int, p)) for p in padding)
        assert self._arr.ndim == len(self._pad)
        assert all(len(p) == 2 for p in self._pad)

    def __array__(self, *args, **kwargs):
        ar = np.asarray(self._arr, *args, **kwargs)
        return np.pad(ar, self._pad, 'constant')

    def __getitem__(self, idx):
        if not isinstance(idx, (list, tuple)):
            idx = (idx,)
        new_arr = self._arr
        new_pad = list(self._pad)
        i_dim = 0
        for s in idx:
            n_arr = new_arr.shape[i_dim]
            p1, p2 = new_pad[i_dim]
            n = n_arr + p1 + p2
            if s is np.newaxis:
                new_pad.insert(i_dim, (0, 0))
                new_arr = np.expand_dims(new_arr, i_dim)
                i_dim += 1
            elif s is Ellipsis:
                # TODO - Support ellipsis
                assert False
            elif isinstance(s, int):
                s = s if s >= 0 else s + n
                assert 0 <= s < n
                new_pad.pop(i_dim)
                if s < p1 or s >= n - p2:
                    new_arr = np.zeros_like(np.take(new_arr, [0], axis=i_dim))
                else:
                    new_arr = np.take(new_arr, [s - p1], axis=i_dim)
                new_arr = np.squeeze(new_arr, i_dim)
            elif isinstance(s, slice):
                start = int(s.start) if s.start else 0
                stop = int(s.stop) if s.stop else n
                start = start if start >= 0 else start + n
                stop = stop if stop >= 0 else stop + n
                # TODO - Support arbitrary steps
                assert s.step in (None, 1)
                start = np.clip(start, 0, n)
                stop = np.clip(stop, start, n)
                d = stop - start
                if d == 0:
                    new_pad[i_dim] = (0, 0)
                    new_arr = np.take(new_arr, [], axis=i_dim)
                elif stop < p1 or start >= n - p2:
                    new_pad[i_dim] = (d, 0)
                    new_arr = np.take(new_arr, [], axis=i_dim)
                else:
                    new_pad[i_dim] = (max(p1 - start, 0), max(stop - p1 - n_arr, 0))
                    new_arr = new_arr[(slice(None),) * i_dim + (slice(max(start - p1, 0), min(stop - p1, n_arr)),)]
                i_dim += 1
            else:
                assert Fail
        return PaddedArray(new_arr, new_pad)

    @property
    def shape(self):
        return tuple(s + p1 + p2 for s, (p1, p2) in zip(self._arr.shape, self._pad))

显然,复杂的部分是切片,此处不支持省略号(...)或任意切片步骤。同样,只要您需要使用它,它就会实例化一个大数组。您可以使用np.asarray来执行此操作,尽管使用其他np.ndarray进行操作或使用NumPy函数应会自动触发转换。以下是一些用法示例:

import numpy as np

a = np.arange(12).reshape(4, 3)
print(a)
# [[ 0  1  2]
#  [ 3  4  5]
#  [ 6  7  8]
#  [ 9 10 11]]
pa = PaddedArray(a, [(1, 3), (0, 2)])
print(pa.shape)
# (8, 5)
print(np.asarray(pa))
# [[ 0  0  0  0  0]
#  [ 0  1  2  0  0]
#  [ 3  4  5  0  0]
#  [ 6  7  8  0  0]
#  [ 9 10 11  0  0]
#  [ 0  0  0  0  0]
#  [ 0  0  0  0  0]
#  [ 0  0  0  0  0]]
print(np.asarray(pa[0]))
# [0 0 0 0 0]
print(np.asarray(pa[:, -3]))
# [ 0  2  5  8 11  0  0  0]
print(np.asarray(pa[3, np.newaxis, 2:]))
# [[8 0 0]]
print(pa[:4, :4] @ a)  # Note it is automatically converted
# [[  0   0   0]
#  [ 15  18  21]
#  [ 42  54  66]
#  [ 69  90 111]]