合并两个数据框并根据公共列选择第一个条目

时间:2020-07-25 17:54:09

标签: python pandas dataframe

我有两个数据框,例如

df1

sub_id     Weight
1          56
2          67
3          81
5          73
9          59

df2

sub_id     Text
1          He is normal.
1          person is healthy.
1          has strong immune power.
3          She is over weight.
3          person is small.
9          Looks good.
5          Not well.
5          Need to be tested.

通过组合这两个数据帧,我需要得到 (当第二个df中有多个sub_id时,需要选择第一个文本并与第一个df组合,如下所示)

merge_df

sub_id   Weight    Text
1        56        He is normal.
2        67        Nan.
3        81        She is over weight.
5        73        Not well.
9        59        Looks good.

有人可以帮我吗? 预先感谢。

1 个答案:

答案 0 :(得分:0)

您在这里:

print(pd.merge(df1, df2.drop_duplicates(subset='sub_id'),
         on='sub_id',
         how='outer'))

输出

   sub_id  Weight                 Text
0       1      56        He is normal.
1       2      67                  NaN
2       3      81  She is over weight.
3       5      73            Not well.
4       9      59          Looks good.

要保留最后一个重复项,请使用参数keep='last'

print(pd.merge(df1, df2.drop_duplicates(subset='sub_id', keep='last'),
         on='sub_id',
         how='outer'))

输出

   sub_id  Weight                      Text
0       1      56  has strong immune power.
1       2      67                       NaN
2       3      81          person is small.
3       5      73        Need to be tested.
4       9      59               Looks good.

相关问题