我正在尝试规范化矩阵中的缺失值。这是代码。
ds1 = pd.read_table('https://gist.githubusercontent.com/anonymous/c5530bd1baecb192e148/raw/ds1', sep=' ', header=None)
# and ds2 is your dataset2, [0, 1] as columns etc.
ds2 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/1692f1f76718c35e939f/raw/6f6b348ab0879b702e1c3c5e362e9d2062e9e9bc/ds2', header=None, sep=' ')
ds2_mean = ds2.groupby(0).mean()
#Rounding the float value
ds2_mean = ds2_mean.apply(np.round)
#ds2_mean = ds2.groupby(0).std()
#print ds2_mean
ds1.replace(0, np.nan, inplace=True)
#print ds1
print ds2_mean[1]
ds1 = ds1.apply(lambda x: x.fillna(ds2_mean[1]))
最后一行应该用ds2_mean[1]
的平均值替换dataset1中的值。但事实并非如此。这里有什么不对吗?
之后我可以用数据集1中它的邻居的平均值替换NaN吗?