我有一个包含nans的大型numpy 1d数组。我需要知道所有不包含任何nans的切片:
import numpy as np
A=np.array([1.0,2.0,3.0,np.nan,4.0,3.0,np.nan,np.nan,np.nan,2.0,2.0,2.0])
该示例的预期结果是:
Slices=[slice(0,3),slice(4,6),slice(9,12)]
答案 0 :(得分:1)
这是一种可能性:
import numpy as np
def valid_slices(array):
m = ~np.isnan(array)
idx = np.arange(len(array))[m]
idx_diff = np.diff(idx)
idx_change = np.where(idx_diff > 1)[0]
idx_start = np.concatenate([[0], idx_change + 1], axis=0)
idx_end = np.concatenate([idx_change, [len(idx) - 1]], axis=0)
return [slice(idx[start], idx[end] + 1) for start, end in zip(idx_start, idx_end)]
A = np.array([1.0,2.0,3.0,np.nan,4.0,3.0,np.nan,np.nan,np.nan,2.0,2.0,2.0])
print(valid_slices(A))
>>> [slice(0, 3, None), slice(4, 6, None), slice(9, 12, None)]
答案 1 :(得分:1)
获得这样一个切片列表的一种方法是在列表理解中执行最少的工作 -
def start_stop_nonNaN_slices(A):
mask = ~np.isnan(A)
mask_ext = np.r_[False, mask, False]
idx = np.flatnonzero(mask_ext[1:] != mask_ext[:-1]).reshape(-1,2)
return [slice(i[0],i[1]) for i in idx]
样品运行 -
In [32]: A
Out[32]:
array([ 1., 2., 3., nan, 4., 3., nan, nan, nan, 2., 2.,
2.])
In [33]: start_stop_nonNaN_slices(A)
Out[33]: [slice(0, 3, None), slice(4, 6, None), slice(9, 12, None)]
In [35]: A
Out[35]:
array([ nan, 1., 2., 3., nan, 4., 3., nan, nan, nan, 2.,
2., 2.])
In [36]: start_stop_nonNaN_slices(A)
Out[36]: [slice(1, 4, None), slice(5, 7, None), slice(10, 13, None)]
以不同格式输出
予。如果你需要那些启动,请将索引作为元组对 -
def start_stop_nonNaN_slices_v2(A):
mask = ~np.isnan(A)
mask_ext = np.r_[False, mask, False]
idx = np.flatnonzero(mask_ext[1:] != mask_ext[:-1])
return zip(idx[::2], idx[1::2])
示例运行 -
In [51]: A
Out[51]:
array([ nan, 1., 2., 3., nan, 4., 3., nan, nan, nan, 2.,
2., 2., nan, nan])
In [52]: start_stop_nonNaN_slices_v2(A)
Out[52]: [(1, 4), (5, 7), (10, 13)]
II。如果你可以将start和stop索引作为两个输出数组,这应该非常有效,因为我们避免任何列表理解或压缩 -
def start_stop_nonNaN_slices_v3(A):
mask = ~np.isnan(A)
mask_ext = np.r_[False, mask, False]
idx = np.flatnonzero(mask_ext[1:] != mask_ext[:-1])
return idx[::2], idx[1::2]
示例运行 -
In [74]: A
Out[74]:
array([ nan, 1., 2., 3., nan, 4., 3., nan, nan, nan, 2.,
2., 2., nan, nan])
In [75]: starts, stops = start_stop_nonNaN_slices_v3(A)
In [76]: starts
Out[76]: array([ 1, 5, 10])
In [77]: stops
Out[77]: array([ 4, 7, 13])
有关效果的说明:为了提高效果,我们可以使用np.concatenate
来替换np.r_
:
mask_ext = np.concatenate(( [False], mask, [False] ))