删除多维数组的前导和尾随全纳米轴

时间:2017-05-01 18:28:51

标签: python numpy multidimensional-array indexing

我正在尝试删除所有前导和尾随的全NaN轴(行和列以及它们用多于3个维度的数组调用它们)。虽然它具有已知尺寸非常简单,但我无法想到使用任意维数阵列的好方法。

例如

import numpy as np
arr1d = np.array([np.nan, np.nan, np.nan, 1, 2, 3, 4, np.nan])
# expected result: np.array([1., 2., 3., 4.])

arr2d = np.array([[np.nan, np.nan, np.nan], 
                  [1., 2., np.nan], 
                  [4., np.nan, np.nan]])
# expected result: np.array([[1., 2.], [4., np.nan]])

arr3d = np.array([[[np.nan, np.nan, np.nan], [1., np.nan, np.nan], [np.nan, np.nan, np.nan]],
                  [[np.nan, np.nan, np.nan], [2., np.nan, np.nan], [np.nan, np.nan, np.nan]],
                  [[np.nan, np.nan, np.nan], [3., np.nan, np.nan], [np.nan, np.nan, np.nan]]])
# expected result: np.array([[[1.]], [[2.]], [[3.]]])

对于1D案例,这很容易:

notnans = np.flatnonzero(~np.isnan(arr1d))
if notnans.size:
    trimmed = arr1d[notnans[0]: notnans[-1]+1]  # slice from first not-nan to the last one
else:
    trimmed = np.zeros(0)

但我正在努力解决多维数组(最好是矢量化)的方法。

3 个答案:

答案 0 :(得分:3)

将此问题改为“从谓词数组False中找到要剪切前导和尾随bool的索引,您可以通过迭代轴来执行此操作,并使用any的多轴形式:

def arg_trim_zeros(pred, axis=None):
    """
    Produces an nd slice index that trims zeros from the specified dimensions
    """
    # allow multiple axis arguments like other numpy functions
    all_axes = tuple(range(pred.ndim))
    if axis is None:
        axis = all_axes
    elif not isinstance(axis, tuple):
        axis = (axis,)

    slices = [slice(None)] * pred.ndim

    # special case if entire input is falsey
    if not pred.any():
        for ax in axis:
            slices[ax] = slice(0, 0)
        return slices

    # compute slices for each dimension in turn
    for ax in axis:
        valid = pred.any(axis=all_axes[:ax] + all_axes[ax+1:])

        # argmax is safe here, because we're sure there is at least one True in pred
        start = valid.argmax()
        stop = len(valid) - valid[::-1].argmax()
        slices[ax] = slice(start, stop)

    return tuple(slices)

所以在你的情况下,这是:

cropped = arr[arg_trim_zeros(~np.isnan(arr))]

其中包含返回原始数组副本的奖励。

这里并不严格需要axis参数,但我认为我会添加它来解决更普遍的问题。

答案 1 :(得分:2)

这是使用np.ix_ -

的一种方法
def remove_nans(a):
    acc = np.maximum.accumulate
    m = ~np.isnan(a)
    n = a.ndim

    if n==1:
        return a[acc(m) & acc(m[::-1])[::-1]]    
    else:
        r = np.tile(np.arange(n),n)
        per_axis_combs = np.delete(r,range(0,len(r),n+1)).reshape(-1,n-1)
        per_axis_combs_tuple = map(tuple,per_axis_combs)

        mask = []
        for i in per_axis_combs_tuple:            
            m0 = m.any(i)            
            mask.append(acc(m0) & acc(m0[::-1])[::-1])
        return a[np.ix_(*mask)]

样品运行 -

1)1D案例:

In [246]: arr1d
Out[246]: array([ nan,  nan,  nan,   1.,   2.,   3.,  nan,   4.,  nan])

In [247]: remove_nans(arr1d)
Out[247]: array([  1.,   2.,   3.,  nan,   4.])

2)2D案例:

In [248]: arr2d_2
Out[248]: 
array([[ nan,  nan,  nan],
       [ nan,  nan,  nan],
       [  1.,   2.,  nan],
       [ nan,  nan,  nan],
       [  4.,  nan,  nan],
       [ nan,  nan,  nan]])

In [249]: remove_nans(arr2d_2)
Out[249]: 
array([[  1.,   2.],
       [ nan,  nan],
       [  4.,  nan]])

3)3D案例:

In [250]: arr3d_2
Out[250]: 
array([[[ nan,  nan,  nan],
        [  1.,  nan,  nan],
        [ nan,  nan,  nan],
        [  4.,  nan,  nan],
        [ nan,  nan,  nan]],

       [[ nan,  nan,  nan],
        [  2.,  nan,  nan],
        [ nan,  nan,  nan],
        [ nan,  nan,  nan],
        [ nan,  nan,  nan]],

       [[ nan,  nan,  nan],
        [  3.,  nan,  nan],
        [ nan,  nan,  nan],
        [ nan,  nan,  nan],
        [ nan,  nan,  nan]]])

In [251]: remove_nans(arr3d_2)
Out[251]: 
array([[[  1.],
        [ nan],
        [  4.]],

       [[  2.],
        [ nan],
        [ nan]],

       [[  3.],
        [ nan],
        [ nan]]])

答案 2 :(得分:1)

这里是你的1d的一个变体,它删除了相对于最后一个轴的nans。

def remove_nans(arr):
    i = tuple(range(arr.ndim-1))
    idx=np.isnan(arr).all(axis=i)
    notnans = np.nonzero(~idx)[0]
    return arr[...,notnans]

In [135]: remove_nans(arr1d)
Out[135]: array([ 1.,  2.,  3.,  4.])
In [136]: remove_nans(arr2d)
Out[136]: 
array([[ nan,  nan],
       [  1.,   2.],
       [  4.,  nan]])
In [137]: remove_nans(arr3d)
Out[137]: 
array([[[ nan],
        [  1.],
        [ nan]],

       [[ nan],
        [  2.],
        [ nan]],

       [[ nan],
        [  3.],
        [ nan]]])

在第二种情况下,我可以在转置上再次应用

In [138]: remove_nans(remove_nans(arr2d).T).T
Out[138]: 
array([[  1.,   2.],
       [  4.,  nan]])

这意味着我们应该能够使用3d递归地执行相同操作。但是让转置正确是比较棘手的。

In [139]: temp = remove_nans(arr3d)
In [140]: temp = remove_nans(temp.transpose(0,2,1)).transpose(0,2,1)
In [141]: temp
Out[141]: 
array([[[ 1.]],

       [[ 2.]],

       [[ 3.]]])
In [142]: temp = remove_nans(temp.transpose(1,2,0)).transpose(1,2,0)
In [143]: temp
Out[143]: 
array([[[ 1.],
        [ 2.],
        [ 3.]]])

因此,广义的多维案例只需迭代所有维度,并将每个维度与最后一个交换并应用删除。

def all_remove(arr):
    temp = remove_nans(arr)
    for i in range(arr.ndim-1):
        temp = remove_nans(temp.swapaxes(i,-1)).swapaxes(i,-1)
    return temp
In [169]: all_remove(arr1d)
Out[169]: array([ 1.,  2.,  3.,  4.])
In [170]: all_remove(arr2d)
Out[170]: 
array([[  1.,   2.],
       [  4.,  nan]])
In [171]: all_remove(arr3d)    # not the same as before
Out[171]: 
array([[[ 1.]],

       [[ 2.]],

       [[ 3.]]])

我尝试了temp = np.moveaxis(remove_nans(np.moveaxis(temp, i,-1)),-1,i)并获得了相同的结果。我必须检查其他3d和4d案例以追踪差异。

虽然有人可能认为这些最后的案例是对的:

In [196]: arr3d[:,:,[0]][:,[1],:][:,:,:]
Out[196]: 
array([[[ 1.]],

       [[ 2.]],

       [[ 3.]]])