Largest sub-array with no NaN

时间:2018-03-08 22:10:28

标签: arrays numpy matrix nan

Consider (m,m) arrays that have the property that all entries are nan's after a row index i, and after a column index j. A typical example is

[[ 0.00528902  0.00202571  0.00339491         nan         nan]
 [ 0.00777443  0.00322426  0.00503715         nan         nan]
 [ 0.00699781  0.00185539  0.00433489         nan         nan]
 [ 0.00526394  0.00254923  0.0034802          nan         nan]
 [        nan         nan         nan         nan         nan]]

In this example A[i,j] is nan if either i>3 or j>2 but in general I only now that they exist but I'm not given their values (3 and 2 in this example).

I would like to find the largest submatrix that contains no nan. In above example that would be

[[ 0.00528902  0.00202571  0.00339491 ]
 [ 0.00777443  0.00322426  0.00503715 ]
 [ 0.00699781  0.00185539  0.00433489 ]
 [ 0.00526394  0.00254923  0.0034802  ]]

In fact, m will be quite large so I'd need this to be very efficient (I have to do this for many (m,m) arrays, and the sizes of the largest subarray containing no nan varies from array to array).

2 个答案:

答案 0 :(得分:2)

充分利用数组的结构

  • 足以扫描第一行和第一列
  • 我们可以用二分法来定位第一个纳米  为此,我们可以使用搜索排序使用事实
    • nan对其他所有事情进行排序
    • 线的最后一个非纳米的左边并不重要 实际上没有排序,因为我们只测试单个纳米

>>> i = A.T[0].searchsorted(np.nan)
>>> j = A[0].searchsorted(np.nan)
>>> A[:i, :j]
array([[0.00528902, 0.00202571, 0.00339491],
       [0.00777443, 0.00322426, 0.00503715],
       [0.00699781, 0.00185539, 0.00433489],
       [0.00526394, 0.00254923, 0.0034802 ]])

答案 1 :(得分:1)

首先,我认为您的问题存在一个小错误,您应该i>3,而不是4,不是吗?我冒昧地编辑那个。

因此,我们要查找i,j的目的是获取所需子矩阵右下角的索引。想到的最有效的方法是使用numpy中的where函数。请考虑以下使用示例numpy数组的代码段:

import numpy as np


a=np.array([[ 0.00528902, 0.00202571,0.00339491, np.nan, np.nan],
    [ 0.00777443, 0.00322426  ,0.00503715, np.nan , np.nan],
    [ 0.00699781, 0.00185539  ,0.00433489, np.nan , np.nan],
    [ 0.00526394, 0.00254923  ,0.0034802 , np.nan , np.nan],
    [np.nan, np.nan, np.nan, np.nan , np.nan]])

indexes=np.where(np.logical_not(np.isnan(a)))
print(indexes)

产生以下输出:

(array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3]), array([0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]))

输出中的第一个数组指定行索引,第二个数组指定列索引,其中包含" non-nan"值。

因此,我们可以清楚地看到,在您的情况下,您寻求的(i,j)

提供
i=indexes[0][-1];#in your case, this is 3
j=indexes[0][-1];#in your case, this is 2