我正在尝试删除所有前导和尾随的全NaN轴(行和列以及它们用多于3个维度的数组调用它们)。虽然它具有已知尺寸非常简单,但我无法想到使用任意维数阵列的好方法。
例如
import numpy as np
arr1d = np.array([np.nan, np.nan, np.nan, 1, 2, 3, 4, np.nan])
# expected result: np.array([1., 2., 3., 4.])
arr2d = np.array([[np.nan, np.nan, np.nan],
[1., 2., np.nan],
[4., np.nan, np.nan]])
# expected result: np.array([[1., 2.], [4., np.nan]])
arr3d = np.array([[[np.nan, np.nan, np.nan], [1., np.nan, np.nan], [np.nan, np.nan, np.nan]],
[[np.nan, np.nan, np.nan], [2., np.nan, np.nan], [np.nan, np.nan, np.nan]],
[[np.nan, np.nan, np.nan], [3., np.nan, np.nan], [np.nan, np.nan, np.nan]]])
# expected result: np.array([[[1.]], [[2.]], [[3.]]])
对于1D案例,这很容易:
notnans = np.flatnonzero(~np.isnan(arr1d))
if notnans.size:
trimmed = arr1d[notnans[0]: notnans[-1]+1] # slice from first not-nan to the last one
else:
trimmed = np.zeros(0)
但我正在努力解决多维数组(最好是矢量化)的方法。
答案 0 :(得分:3)
将此问题改为“从谓词数组False
”中找到要剪切前导和尾随bool
的索引,您可以通过迭代轴来执行此操作,并使用any
的多轴形式:
def arg_trim_zeros(pred, axis=None):
"""
Produces an nd slice index that trims zeros from the specified dimensions
"""
# allow multiple axis arguments like other numpy functions
all_axes = tuple(range(pred.ndim))
if axis is None:
axis = all_axes
elif not isinstance(axis, tuple):
axis = (axis,)
slices = [slice(None)] * pred.ndim
# special case if entire input is falsey
if not pred.any():
for ax in axis:
slices[ax] = slice(0, 0)
return slices
# compute slices for each dimension in turn
for ax in axis:
valid = pred.any(axis=all_axes[:ax] + all_axes[ax+1:])
# argmax is safe here, because we're sure there is at least one True in pred
start = valid.argmax()
stop = len(valid) - valid[::-1].argmax()
slices[ax] = slice(start, stop)
return tuple(slices)
所以在你的情况下,这是:
cropped = arr[arg_trim_zeros(~np.isnan(arr))]
其中包含返回原始数组副本的奖励。
这里并不严格需要axis参数,但我认为我会添加它来解决更普遍的问题。
答案 1 :(得分:2)
这是使用np.ix_
-
def remove_nans(a):
acc = np.maximum.accumulate
m = ~np.isnan(a)
n = a.ndim
if n==1:
return a[acc(m) & acc(m[::-1])[::-1]]
else:
r = np.tile(np.arange(n),n)
per_axis_combs = np.delete(r,range(0,len(r),n+1)).reshape(-1,n-1)
per_axis_combs_tuple = map(tuple,per_axis_combs)
mask = []
for i in per_axis_combs_tuple:
m0 = m.any(i)
mask.append(acc(m0) & acc(m0[::-1])[::-1])
return a[np.ix_(*mask)]
样品运行 -
1)1D案例:
In [246]: arr1d
Out[246]: array([ nan, nan, nan, 1., 2., 3., nan, 4., nan])
In [247]: remove_nans(arr1d)
Out[247]: array([ 1., 2., 3., nan, 4.])
2)2D案例:
In [248]: arr2d_2
Out[248]:
array([[ nan, nan, nan],
[ nan, nan, nan],
[ 1., 2., nan],
[ nan, nan, nan],
[ 4., nan, nan],
[ nan, nan, nan]])
In [249]: remove_nans(arr2d_2)
Out[249]:
array([[ 1., 2.],
[ nan, nan],
[ 4., nan]])
3)3D案例:
In [250]: arr3d_2
Out[250]:
array([[[ nan, nan, nan],
[ 1., nan, nan],
[ nan, nan, nan],
[ 4., nan, nan],
[ nan, nan, nan]],
[[ nan, nan, nan],
[ 2., nan, nan],
[ nan, nan, nan],
[ nan, nan, nan],
[ nan, nan, nan]],
[[ nan, nan, nan],
[ 3., nan, nan],
[ nan, nan, nan],
[ nan, nan, nan],
[ nan, nan, nan]]])
In [251]: remove_nans(arr3d_2)
Out[251]:
array([[[ 1.],
[ nan],
[ 4.]],
[[ 2.],
[ nan],
[ nan]],
[[ 3.],
[ nan],
[ nan]]])
答案 2 :(得分:1)
这里是你的1d的一个变体,它删除了相对于最后一个轴的nans。
def remove_nans(arr):
i = tuple(range(arr.ndim-1))
idx=np.isnan(arr).all(axis=i)
notnans = np.nonzero(~idx)[0]
return arr[...,notnans]
In [135]: remove_nans(arr1d)
Out[135]: array([ 1., 2., 3., 4.])
In [136]: remove_nans(arr2d)
Out[136]:
array([[ nan, nan],
[ 1., 2.],
[ 4., nan]])
In [137]: remove_nans(arr3d)
Out[137]:
array([[[ nan],
[ 1.],
[ nan]],
[[ nan],
[ 2.],
[ nan]],
[[ nan],
[ 3.],
[ nan]]])
在第二种情况下,我可以在转置上再次应用
In [138]: remove_nans(remove_nans(arr2d).T).T
Out[138]:
array([[ 1., 2.],
[ 4., nan]])
这意味着我们应该能够使用3d递归地执行相同操作。但是让转置正确是比较棘手的。
In [139]: temp = remove_nans(arr3d)
In [140]: temp = remove_nans(temp.transpose(0,2,1)).transpose(0,2,1)
In [141]: temp
Out[141]:
array([[[ 1.]],
[[ 2.]],
[[ 3.]]])
In [142]: temp = remove_nans(temp.transpose(1,2,0)).transpose(1,2,0)
In [143]: temp
Out[143]:
array([[[ 1.],
[ 2.],
[ 3.]]])
因此,广义的多维案例只需迭代所有维度,并将每个维度与最后一个交换并应用删除。
def all_remove(arr):
temp = remove_nans(arr)
for i in range(arr.ndim-1):
temp = remove_nans(temp.swapaxes(i,-1)).swapaxes(i,-1)
return temp
In [169]: all_remove(arr1d)
Out[169]: array([ 1., 2., 3., 4.])
In [170]: all_remove(arr2d)
Out[170]:
array([[ 1., 2.],
[ 4., nan]])
In [171]: all_remove(arr3d) # not the same as before
Out[171]:
array([[[ 1.]],
[[ 2.]],
[[ 3.]]])
我尝试了temp = np.moveaxis(remove_nans(np.moveaxis(temp, i,-1)),-1,i)
并获得了相同的结果。我必须检查其他3d和4d案例以追踪差异。
虽然有人可能认为这些最后的案例是对的:
In [196]: arr3d[:,:,[0]][:,[1],:][:,:,:]
Out[196]:
array([[[ 1.]],
[[ 2.]],
[[ 3.]]])