Question

我有两个数据框，例如

df1

sub_id     Weight
1          56
2          67
3          81
5          73
9          59

df2

sub_id     Text
1          He is normal.
1          person is healthy.
1          has strong immune power.
3          She is over weight.
3          person is small.
9          Looks good.
5          Not well.
5          Need to be tested.

通过组合这两个数据帧，我需要得到（当第二个df中有多个sub_id时，需要选择第一个文本并与第一个df组合，如下所示）

merge_df

sub_id   Weight    Text
1        56        He is normal.
2        67        Nan.
3        81        She is over weight.
5        73        Not well.
9        59        Looks good.

有人可以帮我吗？预先感谢。

Answer 1

您在这里：

print(pd.merge(df1, df2.drop_duplicates(subset='sub_id'),
         on='sub_id',
         how='outer'))

输出

   sub_id  Weight                 Text
0       1      56        He is normal.
1       2      67                  NaN
2       3      81  She is over weight.
3       5      73            Not well.
4       9      59          Looks good.

要保留最后一个重复项，请使用参数keep='last'

print(pd.merge(df1, df2.drop_duplicates(subset='sub_id', keep='last'),
         on='sub_id',
         how='outer'))

输出

   sub_id  Weight                      Text
0       1      56  has strong immune power.
1       2      67                       NaN
2       3      81          person is small.
3       5      73        Need to be tested.
4       9      59               Looks good.

合并两个数据框并根据公共列选择第一个条目

1 个答案: