我有一个缺少值的NumPy数组。我想垂直估算最接近值的均值。
import numpy as np
arr = np.random.randint(0, 10, (10, 4)).astype(float)
arr[2, 0] = np.nan
arr[4, 3] = np.nan
arr[0, 2] = np.nan
print(arr)
[[ 5. 7. nan 4.] # should be 4
[ 2. 6. 4. 9.]
[nan 2. 5. 5.] # should be 4.5
[ 7. 0. 3. 8.]
[ 6. 4. 3. nan] # should be 4
[ 8. 1. 2. 0.]
[ 0. 0. 1. 1.]
[ 1. 2. 6. 6.]
[ 8. 1. 9. 7.]
[ 3. 5. 8. 8.]]
答案 0 :(得分:0)
import numpy as np
arr = np.random.randint(0, 10, (10, 4)).astype(float)
arr[2, 0] = np.nan
arr[4, 3] = np.nan
arr[0, 2] = np.nan
print(arr)
[[ 5. 7. nan 4.]
[ 2. 6. 4. 9.]
[nan 2. 5. 5.]
[ 7. 0. 3. 8.]
[ 6. 4. 3. nan]
[ 8. 1. 2. 0.]
[ 0. 0. 1. 1.]
[ 1. 2. 6. 6.]
[ 8. 1. 9. 7.]
[ 3. 5. 8. 8.]]
for x, y in np.argwhere(np.isnan(arr)):
sample = arr[np.maximum(x - 1, 0):np.minimum(x + 2, 20), y]
arr[x, y] = np.mean(sample[np.logical_not(np.isnan(sample))])
print(arr)
[[5. 7. 4. 4. ] # 3rd value here is mean(4)
[2. 6. 4. 9. ]
[4.5 2. 5. 5. ] # first value here is mean(2, 7)
[7. 0. 3. 8. ]
[6. 4. 3. 4. ] # 4th value here is mean(8, 0)
[8. 1. 2. 0. ]
[0. 0. 1. 1. ]
[1. 2. 6. 6. ]
[8. 1. 9. 7. ]
[3. 5. 8. 8. ]]
答案 1 :(得分:0)
如果您愿意使用熊猫,pd.DataFrame.interpolate
易于使用。如果在数组末尾“内插”值,则设置limit_direction
:
df = pd.DataFrame(arr).interpolate(limit_direction='both')
df.to_numpy() # back to a numpy array if needed (if using v0.24.0 or above)
输出:
array([[5. , 7. , 4. , 4. ],
[2. , 6. , 4. , 9. ],
[4.5, 2. , 5. , 5. ],
[7. , 0. , 3. , 8. ],
[6. , 4. , 3. , 4. ],
[8. , 1. , 2. , 0. ],
[0. , 0. , 1. , 1. ],
[1. , 2. , 6. , 6. ],
[8. , 1. , 9. , 7. ],
[3. , 5. , 8. , 8. ]])