假设其中有一个高维的numpy数组:
import numpy as np
x = np.zeros((200, 200, 200))
其中只有一个连续的*子数组是“有效”条目。其他条目可能会被忽略(在此示例中,每个为1的条目都是有效的,可能会忽略0s)
sub_array = np.s_[100:110, 100:110, 100:110]
x[sub_array] = 1
我如何在Python中表示x,使其与其他numpy数组(切片,索引等)集成,但不浪费所有无效条目的内存?
*如果可能的话,我会对子集也不一定是数组的解决方案感兴趣
答案 0 :(得分:1)
在几个用例中,您可能会遇到精巧的类,该类实现了__array__
方法。这是一个可能的实现方式的草图:
import numpy as np
class PaddedArray:
def __init__(self, arr, padding):
self._arr = np.array(arr)
self._pad = list(tuple(map(int, p)) for p in padding)
assert self._arr.ndim == len(self._pad)
assert all(len(p) == 2 for p in self._pad)
def __array__(self, *args, **kwargs):
ar = np.asarray(self._arr, *args, **kwargs)
return np.pad(ar, self._pad, 'constant')
def __getitem__(self, idx):
if not isinstance(idx, (list, tuple)):
idx = (idx,)
new_arr = self._arr
new_pad = list(self._pad)
i_dim = 0
for s in idx:
n_arr = new_arr.shape[i_dim]
p1, p2 = new_pad[i_dim]
n = n_arr + p1 + p2
if s is np.newaxis:
new_pad.insert(i_dim, (0, 0))
new_arr = np.expand_dims(new_arr, i_dim)
i_dim += 1
elif s is Ellipsis:
# TODO - Support ellipsis
assert False
elif isinstance(s, int):
s = s if s >= 0 else s + n
assert 0 <= s < n
new_pad.pop(i_dim)
if s < p1 or s >= n - p2:
new_arr = np.zeros_like(np.take(new_arr, [0], axis=i_dim))
else:
new_arr = np.take(new_arr, [s - p1], axis=i_dim)
new_arr = np.squeeze(new_arr, i_dim)
elif isinstance(s, slice):
start = int(s.start) if s.start else 0
stop = int(s.stop) if s.stop else n
start = start if start >= 0 else start + n
stop = stop if stop >= 0 else stop + n
# TODO - Support arbitrary steps
assert s.step in (None, 1)
start = np.clip(start, 0, n)
stop = np.clip(stop, start, n)
d = stop - start
if d == 0:
new_pad[i_dim] = (0, 0)
new_arr = np.take(new_arr, [], axis=i_dim)
elif stop < p1 or start >= n - p2:
new_pad[i_dim] = (d, 0)
new_arr = np.take(new_arr, [], axis=i_dim)
else:
new_pad[i_dim] = (max(p1 - start, 0), max(stop - p1 - n_arr, 0))
new_arr = new_arr[(slice(None),) * i_dim + (slice(max(start - p1, 0), min(stop - p1, n_arr)),)]
i_dim += 1
else:
assert Fail
return PaddedArray(new_arr, new_pad)
@property
def shape(self):
return tuple(s + p1 + p2 for s, (p1, p2) in zip(self._arr.shape, self._pad))
显然,复杂的部分是切片,此处不支持省略号(...
)或任意切片步骤。同样,只要您需要使用它,它就会实例化一个大数组。您可以使用np.asarray
来执行此操作,尽管使用其他np.ndarray
进行操作或使用NumPy函数应会自动触发转换。以下是一些用法示例:
import numpy as np
a = np.arange(12).reshape(4, 3)
print(a)
# [[ 0 1 2]
# [ 3 4 5]
# [ 6 7 8]
# [ 9 10 11]]
pa = PaddedArray(a, [(1, 3), (0, 2)])
print(pa.shape)
# (8, 5)
print(np.asarray(pa))
# [[ 0 0 0 0 0]
# [ 0 1 2 0 0]
# [ 3 4 5 0 0]
# [ 6 7 8 0 0]
# [ 9 10 11 0 0]
# [ 0 0 0 0 0]
# [ 0 0 0 0 0]
# [ 0 0 0 0 0]]
print(np.asarray(pa[0]))
# [0 0 0 0 0]
print(np.asarray(pa[:, -3]))
# [ 0 2 5 8 11 0 0 0]
print(np.asarray(pa[3, np.newaxis, 2:]))
# [[8 0 0]]
print(pa[:4, :4] @ a) # Note it is automatically converted
# [[ 0 0 0]
# [ 15 18 21]
# [ 42 54 66]
# [ 69 90 111]]