NumPy:所有NaN的两个最近行的估算平均值

时间:2019-12-14 00:04:38

标签: python numpy

我有一个缺少值的NumPy数组。我想垂直估算最接近值的均值。

import numpy as np

arr = np.random.randint(0, 10, (10, 4)).astype(float)

arr[2, 0] = np.nan
arr[4, 3] = np.nan
arr[0, 2] = np.nan

print(arr)
[[ 5.  7. nan  4.] # should be 4
 [ 2.  6.  4.  9.]
 [nan  2.  5.  5.] # should be 4.5
 [ 7.  0.  3.  8.]
 [ 6.  4.  3. nan] # should be 4
 [ 8.  1.  2.  0.]
 [ 0.  0.  1.  1.]
 [ 1.  2.  6.  6.]
 [ 8.  1.  9.  7.]
 [ 3.  5.  8.  8.]]

2 个答案:

答案 0 :(得分:0)

import numpy as np

arr = np.random.randint(0, 10, (10, 4)).astype(float)

arr[2, 0] = np.nan
arr[4, 3] = np.nan
arr[0, 2] = np.nan
print(arr)
[[ 5.  7. nan  4.]
 [ 2.  6.  4.  9.]
 [nan  2.  5.  5.]
 [ 7.  0.  3.  8.]
 [ 6.  4.  3. nan]
 [ 8.  1.  2.  0.]
 [ 0.  0.  1.  1.]
 [ 1.  2.  6.  6.]
 [ 8.  1.  9.  7.]
 [ 3.  5.  8.  8.]]
for x, y in np.argwhere(np.isnan(arr)):
    sample = arr[np.maximum(x - 1, 0):np.minimum(x + 2, 20), y]
    arr[x, y] = np.mean(sample[np.logical_not(np.isnan(sample))])
print(arr)
[[5.  7.  4.  4. ] # 3rd value here is mean(4)
 [2.  6.  4.  9. ]
 [4.5 2.  5.  5. ] # first value here is mean(2, 7)
 [7.  0.  3.  8. ]
 [6.  4.  3.  4. ] # 4th value here is mean(8, 0)
 [8.  1.  2.  0. ]
 [0.  0.  1.  1. ]
 [1.  2.  6.  6. ]
 [8.  1.  9.  7. ]
 [3.  5.  8.  8. ]]

答案 1 :(得分:0)

如果您愿意使用熊猫,pd.DataFrame.interpolate易于使用。如果在数组末尾“内插”值,则设置limit_direction

df = pd.DataFrame(arr).interpolate(limit_direction='both')
df.to_numpy()    # back to a numpy array if needed (if using v0.24.0 or above)

输出:

array([[5. , 7. , 4. , 4. ],
       [2. , 6. , 4. , 9. ],
       [4.5, 2. , 5. , 5. ],
       [7. , 0. , 3. , 8. ],
       [6. , 4. , 3. , 4. ],
       [8. , 1. , 2. , 0. ],
       [0. , 0. , 1. , 1. ],
       [1. , 2. , 6. , 6. ],
       [8. , 1. , 9. , 7. ],
       [3. , 5. , 8. , 8. ]])