我有一个2D numpy数组,其中包含我的值(其中一些可能是NaN)。我想删除30%的非NaN值,并用数组的平均值替换它们。我怎么能这样做?到目前为止我尝试了什么:
def spar_removal(array, mean_value, sparseness):
array1 = deepcopy(array)
array2 = array1
spar_size = int(round(array2.shape[0]*array2.shape[1]*sparseness))
for i in range (0, spar_size):
index = np.random.choice(np.where(array2 != mean_value)[1])
array2[0, index] = mean_value
return array2
但这只是选择我的阵列的同一行。如何从阵列中删除?似乎选择仅适用于一个维度。我想我想要的是计算我将用(x, y)
替换其值的mean_value
对。
答案 0 :(得分:3)
可能有更好的方法,但请考虑:
import numpy as np
x = np.array([[1,2,3,4],
[1,2,3,4],
[np.NaN, np.NaN, np.NaN, np.NaN],
[1,2,3,4]])
# Get a vector of 1-d indexed indexes of non NaN elements
indices = np.where(np.isfinite(x).ravel())[0]
# Shuffle the indices, select the first 30% (rounded down with int())
to_replace = np.random.permutation(indices)[:int(indices.size * 0.3)]
# Replace those indices with the mean (ignoring NaNs)
x[np.unravel_index(to_replace, x.shape)] = np.nanmean(x)
print(x)
示例输出
[[ 2.5 2. 2.5 4. ] [ 1. 2. 3. 4. ] [ nan nan nan nan] [ 2.5 2. 3. 4. ]]
NaNs永远不会改变,并且地板(0.3 *非NaN元素的数量)将被设置为均值(均值忽略NaNs)。
答案 1 :(得分:1)
因为返回两个数组包含索引,所以这就是你想要的:
def spar_removal(array, mean_value, sparseness):
array1 = copy.deepcopy(array)
array2 = array1
spar_size = int(round(array2.shape[0]*array2.shape[1]*sparseness))
# This is used to filtered out nan
indexs = np.where(array2==array2)
indexsL = len(indexs[0])
for i in np.random.choice(indexsL,spar_size,replace=False):
indexX = indexs[0][i]
indexY = indexs[1][i]
array2[indexX,indexY] = mean_value
return array2