我实际上有两个数据帧,一个是:
seq1_id seq2_id dN dS Dist1 Dist_brute kingdom
seq1 seq2 45 56 23 455 eucaryota
seq6 seq9 34 43 34 453 procaryota
seq3 seq98 32 34 21 90 Virus
seq21 seq87 32 12 35 211 Virus
和另一个像:
seq1_id seq2_id dN dS Dist1 Dist_brute
seq1 seq2 45 56 23 455
seq4 seq12 78 45 32 789
seq3 seq98 32 34 21 90
seq21 seq87 32 12 35 211
seq45 seq90 21 23 12 123
seq6 seq9 34 43 34 453
我想做的是获得一个新的数据框:
seq1_id seq2_id dN dS Dist1 Dist_brute kingdom
seq1 seq2 45 56 23 455 eucaryota
seq4 seq12 78 45 32 789 NaN
seq3 seq98 32 34 21 90 Virus
seq21 seq87 32 12 35 211 Virus
seq45 seq90 21 23 12 123 NaN
seq6 seq9 34 43 34 453 procaryota
有人有想法吗? 谢谢:))
答案 0 :(得分:1)
对我来说,工作省略参数on
,以便left
加入所有列的合并:
df = df2.merge(df1, how='left')
如果需要为merge
定义列:
df = df2.merge(df1, on=['seq1_id','seq2_id','dN','dS','Dist1','Dist_brute'], how='left')
print (df)
seq1_id seq2_id dN dS Dist1 Dist_brute kingdom
0 seq1 seq2 45 56 23 455 eucaryota
1 seq4 seq12 78 45 32 789 NaN
2 seq3 seq98 32 34 21 90 Virus
3 seq21 seq87 32 12 35 211 Virus
4 seq45 seq90 21 23 12 123 NaN
5 seq6 seq9 34 43 34 453 procaryota