我有一个DataFrame,其中每行代表一位医生的访问,每列包含来自单个诊断测试的数据。数据不完整,缺少的值填充了NaN。
这是一个简化的例子:
AGE Height SEX Weight
0 79 40 Male 90
1 79 21 Male 20
2 79 NaN Male 50
3 79 89 Male NaN
4 79 90 Male 57
5 81 87 Female NaN
6 81 NaN Female 89
7 81 54 Female 79
8 81 21 Female NaN
9 81 23 Female 23
我想用同一性别和年龄的患者的人口均值替换每个NaN。我已经能够创建一个DataFrame,其中包含每个AGE和SEX组合的方法以及以下内容:
age_sex_means = df.groupby(['SEX', 'AGE'])['Height','Weight'].mean()
产生以下DataFrame:
Height Weight
SEX AGE
Female 81 37.0 38.2
Male 79 48.0 43.4
但是我找不到用第二个中包含的方法替换第一个DataFrame中的NaN的方法。 Using Pandas to fill NaN entries based on values in a different column, using a dictionary as a guide似乎都解决了与我类似的情况,但只有一个指数在我的确切情况下显然不会起作用。
答案 0 :(得分:1)
选项1
您可以使用apply
加上fillna
df.groupby(['AGE', 'SEX'], group_keys=False).apply(lambda x: x.fillna(x.mean()))
AGE Height SEX Weight
0 79 40.00 Male 90.000000
1 79 21.00 Male 20.000000
2 79 60.00 Male 50.000000
3 79 89.00 Male 54.250000
4 79 90.00 Male 57.000000
5 81 87.00 Female 63.666667
6 81 46.25 Female 89.000000
7 81 54.00 Female 79.000000
8 81 21.00 Female 63.666667
9 81 23.00 Female 23.000000
选项2
使用transform
和combine_first
制作副本
df.combine_first(df.groupby(['SEX', 'AGE']).transform('mean'))
AGE Height SEX Weight
0 79 40.00 Male 90.000000
1 79 21.00 Male 20.000000
2 79 60.00 Male 50.000000
3 79 89.00 Male 54.250000
4 79 90.00 Male 57.000000
5 81 87.00 Female 63.666667
6 81 46.25 Female 89.000000
7 81 54.00 Female 79.000000
8 81 21.00 Female 63.666667
9 81 23.00 Female 23.000000
选项3
与fillna
df.fillna(df.groupby(['SEX', 'AGE']).transform('mean'))
AGE Height SEX Weight
0 79 40.00 Male 90.000000
1 79 21.00 Male 20.000000
2 79 60.00 Male 50.000000
3 79 89.00 Male 54.250000
4 79 90.00 Male 57.000000
5 81 87.00 Female 63.666667
6 81 46.25 Female 89.000000
7 81 54.00 Female 79.000000
8 81 21.00 Female 63.666667
9 81 23.00 Female 23.000000
选项4
或使用update
df.update(df.groupby(['SEX', 'AGE']).transform('mean'))
df
AGE Height SEX Weight
0 79 40.00 Male 90.000000
1 79 21.00 Male 20.000000
2 79 60.00 Male 50.000000
3 79 89.00 Male 54.250000
4 79 90.00 Male 57.000000
5 81 87.00 Female 63.666667
6 81 46.25 Female 89.000000
7 81 54.00 Female 79.000000
8 81 21.00 Female 63.666667
9 81 23.00 Female 23.000000