基于行条目替换pandas DataFrame中的NaN

时间:2017-11-11 23:46:15

标签: python pandas dataframe

我有一个DataFrame,其中每行代表一位医生的访问,每列包含来自单个诊断测试的数据。数据不完整,缺少的值填充了NaN。

这是一个简化的例子:

       AGE Height     SEX Weight
0   79     40    Male     90
1   79     21    Male     20
2   79    NaN    Male     50
3   79     89    Male    NaN
4   79     90    Male     57
5   81     87  Female    NaN
6   81    NaN  Female     89
7   81     54  Female     79
8   81     21  Female    NaN
9   81     23  Female     23

我想用同一性别和年龄的患者的人口均值替换每个NaN。我已经能够创建一个DataFrame,其中包含每个AGE和SEX组合的方法以及以下内容:

age_sex_means = df.groupby(['SEX', 'AGE'])['Height','Weight'].mean()

产生以下DataFrame:

                Height  Weight
SEX    AGE                
Female 81     37.0    38.2
Male   79     48.0    43.4

但是我找不到用第二个中包含的方法替换第一个DataFrame中的NaN的方法。 Using Pandas to fill NaN entries based on values in a different column, using a dictionary as a guide似乎都解决了与我类似的情况,但只有一个指数在我的确切情况下显然不会起作用。

1 个答案:

答案 0 :(得分:1)

选项1
您可以使用apply加上fillna

df.groupby(['AGE', 'SEX'], group_keys=False).apply(lambda x: x.fillna(x.mean()))

   AGE  Height     SEX     Weight
0   79   40.00    Male  90.000000
1   79   21.00    Male  20.000000
2   79   60.00    Male  50.000000
3   79   89.00    Male  54.250000
4   79   90.00    Male  57.000000
5   81   87.00  Female  63.666667
6   81   46.25  Female  89.000000
7   81   54.00  Female  79.000000
8   81   21.00  Female  63.666667
9   81   23.00  Female  23.000000

选项2
使用transformcombine_first制作副本

df.combine_first(df.groupby(['SEX', 'AGE']).transform('mean'))

   AGE  Height     SEX     Weight
0   79   40.00    Male  90.000000
1   79   21.00    Male  20.000000
2   79   60.00    Male  50.000000
3   79   89.00    Male  54.250000
4   79   90.00    Male  57.000000
5   81   87.00  Female  63.666667
6   81   46.25  Female  89.000000
7   81   54.00  Female  79.000000
8   81   21.00  Female  63.666667
9   81   23.00  Female  23.000000

选项3
fillna

相同
df.fillna(df.groupby(['SEX', 'AGE']).transform('mean'))

   AGE  Height     SEX     Weight
0   79   40.00    Male  90.000000
1   79   21.00    Male  20.000000
2   79   60.00    Male  50.000000
3   79   89.00    Male  54.250000
4   79   90.00    Male  57.000000
5   81   87.00  Female  63.666667
6   81   46.25  Female  89.000000
7   81   54.00  Female  79.000000
8   81   21.00  Female  63.666667
9   81   23.00  Female  23.000000

选项4
或使用update

进行编辑
df.update(df.groupby(['SEX', 'AGE']).transform('mean'))
df

   AGE  Height     SEX     Weight
0   79   40.00    Male  90.000000
1   79   21.00    Male  20.000000
2   79   60.00    Male  50.000000
3   79   89.00    Male  54.250000
4   79   90.00    Male  57.000000
5   81   87.00  Female  63.666667
6   81   46.25  Female  89.000000
7   81   54.00  Female  79.000000
8   81   21.00  Female  63.666667
9   81   23.00  Female  23.000000