Consider (m,m)
arrays that have the property that all entries are nan
's after a row index i
, and after a column index j
. A typical example is
[[ 0.00528902 0.00202571 0.00339491 nan nan]
[ 0.00777443 0.00322426 0.00503715 nan nan]
[ 0.00699781 0.00185539 0.00433489 nan nan]
[ 0.00526394 0.00254923 0.0034802 nan nan]
[ nan nan nan nan nan]]
In this example A[i,j]
is nan
if either i>3
or j>2
but in general I only now that they exist but I'm not given their values (3
and 2
in this example).
I would like to find the largest submatrix that contains no nan
. In above example that would be
[[ 0.00528902 0.00202571 0.00339491 ]
[ 0.00777443 0.00322426 0.00503715 ]
[ 0.00699781 0.00185539 0.00433489 ]
[ 0.00526394 0.00254923 0.0034802 ]]
In fact, m
will be quite large so I'd need this to be very efficient (I have to do this for many (m,m)
arrays, and the sizes of the largest subarray containing no nan
varies from array to array).
答案 0 :(得分:2)
充分利用数组的结构
>>> i = A.T[0].searchsorted(np.nan)
>>> j = A[0].searchsorted(np.nan)
>>> A[:i, :j]
array([[0.00528902, 0.00202571, 0.00339491],
[0.00777443, 0.00322426, 0.00503715],
[0.00699781, 0.00185539, 0.00433489],
[0.00526394, 0.00254923, 0.0034802 ]])
答案 1 :(得分:1)
首先,我认为您的问题存在一个小错误,您应该i>3
,而不是4
,不是吗?我冒昧地编辑那个。
因此,我们要查找i,j
的目的是获取所需子矩阵右下角的索引。想到的最有效的方法是使用numpy中的where
函数。请考虑以下使用示例numpy数组的代码段:
import numpy as np
a=np.array([[ 0.00528902, 0.00202571,0.00339491, np.nan, np.nan],
[ 0.00777443, 0.00322426 ,0.00503715, np.nan , np.nan],
[ 0.00699781, 0.00185539 ,0.00433489, np.nan , np.nan],
[ 0.00526394, 0.00254923 ,0.0034802 , np.nan , np.nan],
[np.nan, np.nan, np.nan, np.nan , np.nan]])
indexes=np.where(np.logical_not(np.isnan(a)))
print(indexes)
产生以下输出:
(array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3]), array([0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]))
输出中的第一个数组指定行索引,第二个数组指定列索引,其中包含" non-nan"值。
因此,我们可以清楚地看到,在您的情况下,您寻求的(i,j)
由
i=indexes[0][-1];#in your case, this is 3
j=indexes[0][-1];#in your case, this is 2