我有两个这样的数据帧
df1
Entry Sequence
0 A0A024QZ18 MSGLEMADHMMAMNHGRFPDGTNGLHHHPAHRMGMGQFPSPHHHQQ
1 A0A024QZ42 MAALSGGGGGGAEPGQALFNGDMEPEAGAGAGAAASSAADPAIPf
2 A0A024QZB8 MLWWEEVEDCYEREDVQKKTFTKWVNAQFSKFGKQHIENLFSDLQD
3 A0A024QZP7 MARFGDEMPARYGGGGSGAAAGVVVGSGGGRGAGGSRQGGQPGAQR
4 A0A024QZX5 MRPDRAEAPGPPAMAAGGPGAGSAAPVSSTSSLPLAALNMRVRRRL
5 A0A024QZ33 MNSPGGRGKKKGSGGASNPVPPRPPPPCLAPAPPAAGPAPPPESPH
df2
Seq_id number
0 A0A024QZ18 67
1 A0A024QZ33 45
2 A0A024QZ42 252
3 A0A024QZB8 35
4 A0A024QZP7 34
5 A0A024QZX5 54
我想检查dataFrame df1中的哪个条目存在于df2的Se Seq_id中,如果存在,我想将df1中的Sequence打印为类似id的df2 infrot中的新列。如果不存在,请打印“ nan”。
Example answer:
Seq_id number Sequence
0 A0A024QZ18 67 MSGLEMADHMMAMNHGRFPDGTNGLHHHPAHRMGMGQFPSPHHHQQ
1 A0A024QZ33 45 MNSPGGRGKKKGSGGASNPVPPRPPP
2 A0A024QZ42 252 MAALSGGGGGGAEPGQALFNGDMEPEAG
3 A0A024QZB8 35 MLWWEEVEDCYEREDVQKKTFTKWVNAQFSKFGKQHIENLFSDLQD...
4 A0A024QZP7 34 MARFGDEMPARYGGGGSGAAAGVVVGSGG
5 A0A024QZX5 54 MRPDRAEAPGPPAMAAGGPGAGSAAPVSS
我正在尝试按如下方式查看它们是否在该列中
df2.seq_id.isin(df_seq.Entry)
但是如果它们相似,我不知道如何打印另一列,如果不相似,我会给出nan。
答案 0 :(得分:2)
我认为,简单的左联接将满足您的要求。
df1.merge(df2, how='left', left_on='Entry', right_on='Seq_id')
这将为您提供输出
Entry Sequence Seq_id number
A0A024QZ18 MSGLEMADHMMAMNHGRFPDGTNGLHHHPAHRMGMGQFPSPHHHQQ A0A024QZ18 67
A0A024QZ42 MAALSGGGGGGAEPGQALFNGDMEPEAGAGAGAAASSAADPAIPf A0A024QZ42 252
A0A024QZB8 MLWWEEVEDCYEREDVQKKTFTKWVNAQFSKFGKQHIENLFSDLQD A0A024QZB8 35
A0A024QZP7 MARFGDEMPARYGGGGSGAAAGVVVGSGGGRGAGGSRQGGQPGAQR A0A024QZP7 34
A0A024QZX5 MRPDRAEAPGPPAMAAGGPGAGSAAPVSSTSSLPLAALNMRVRRRL A0A024QZX5 54
A0A024QZ33 MNSPGGRGKKKGSGGASNPVPPRPPPPCLAPAPPAAGPAPPPESPH A0A024QZ33 45