Question

我有某些盆地的探测器高度数据。零高度值是虚假的，我想用同一盆地中探针的平均高度值代替它们。

p1 - 1

我首先创建一个将平均高度存储在一个盆地中的DataFrame：

import pandas as pd

index = [0,1,2,3,4,5]
s = pd.Series([0,2,2,0,1,6],index= index)  #height values
t = pd.Series(['A','A','A','B','B','B'],index= index)  #basins' names
df = pd.concat([s,t], axis=1, keys=['Height','Basin'])
print(df)

   Height Basin
0       0     A
1       2     A
2       2     A
3       0     B
4       1     B
5       6     B

然后我尝试将零值替换为相应盆地的平均值：

#find height avergage in same basin
bound_df = df[df['Height']>0]
mean_height_df = bound_df.groupby(['Basin'])['Height'].mean()
print(mean_height_df)

Basin
A    2.0
B    3.5

但这会引发我不理解的错误：

文件“ pandas / _libs / hashtable_class_helper.pxi”，第1218行，在   pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError：'Basin'

这是什么意思？这是切片问题吗？

有替代方法吗？

Answer 1

我认为您对此太想了。尝试使用fillna，然后根据索引填充值。您需要进行一些设置，然后可以照常使用mean_height_df。

# Set "Basin" as the index.
v = df.set_index('Basin')['Height']  
# Mask values that <= 0 and fill NaNs by the computed mean. 
df['Height'] = v.mask(v.le(0)).fillna(mean_height_df).values

用具有相似属性的项目的平均值替换属性零值

1 个答案: