如何在基于另一列的列中填充缺失值

时间:2019-03-08 17:04:41

标签: python pandas dataframe

我有一个称为鞋的数据框

Brand   Comment
Ugg       NaN
Prada     NaN
Clarks    NaN
Ugg       NaN
Clark     NaN
Prada     Made from horse leather
Prada     Made from pig leather
Prada     NaN
Ugg       Made from Australian cow leather
...

和另一个数据框df_mode,该数据框df_mode是通过在鞋数据框中获取非空值的每个鞋品牌的注释模式而获得的

Brand  Comment
Ugg    Made from sheep 
Prada  Made from pig leather
Clarks Made from Cow leather

如何在鞋子数据框中为每个鞋子品牌分配缺失值,并在df_mode数据框中显示其相应的模式响应。

这基本上就是我要实现的目标

Brand   Comment
Ugg       Made from sheep
Prada     Made from pig leather
Clarks    Made from Cow leather
Ugg       Made from sheep
Clark     Made from Cow leather
Prada     Made from horse leather
Prada     Made from pig leather
Prada     Made from pig leather
Ugg       Made from Australian cow leather

3 个答案:

答案 0 :(得分:0)

使用np.where

shoes['Comment']=np.where(shoes['Comment'].isnull(),shoes['Brand'].map(dict(zip(df_mode['Brand']))),df_mode['Comment'],shoes['Comment'])

答案 1 :(得分:0)

使用locmap

shoes.loc[shoes.Comment.isna(), 'Comment'] = shoes.Brand.map(df_mode.set_index('Brand')['Comment'])

答案 2 :(得分:0)

您可以先按“品牌”列groupby,然后填写缺失值。这是实现:

df['Comment'] = df.groupby(['Brand'], sort=False)['Comment'].apply(lambda x: x.ffill().bfill())