我有两个数据框,例如
df1
sub_id Weight
1 56
2 67
3 81
5 73
9 59
df2
sub_id Text
1 He is normal.
1 person is healthy.
1 has strong immune power.
3 She is over weight.
3 person is small.
9 Looks good.
5 Not well.
5 Need to be tested.
通过组合这两个数据帧,我需要得到 (当第二个df中有多个sub_id时,需要选择第一个文本并与第一个df组合,如下所示)
merge_df
sub_id Weight Text
1 56 He is normal.
2 67 Nan.
3 81 She is over weight.
5 73 Not well.
9 59 Looks good.
有人可以帮我吗? 预先感谢。
答案 0 :(得分:0)
您在这里:
print(pd.merge(df1, df2.drop_duplicates(subset='sub_id'),
on='sub_id',
how='outer'))
输出
sub_id Weight Text
0 1 56 He is normal.
1 2 67 NaN
2 3 81 She is over weight.
3 5 73 Not well.
4 9 59 Looks good.
要保留最后一个重复项,请使用参数keep='last'
print(pd.merge(df1, df2.drop_duplicates(subset='sub_id', keep='last'),
on='sub_id',
how='outer'))
输出
sub_id Weight Text
0 1 56 has strong immune power.
1 2 67 NaN
2 3 81 person is small.
3 5 73 Need to be tested.
4 9 59 Looks good.