在熊猫的两个数据框中比较两列,如果它们相似,则获取另一列的值

时间:2019-12-17 09:23:36

标签: python python-3.x pandas

我有两个这样的数据帧

df1
      Entry           Sequence
0    A0A024QZ18    MSGLEMADHMMAMNHGRFPDGTNGLHHHPAHRMGMGQFPSPHHHQQ
1    A0A024QZ42    MAALSGGGGGGAEPGQALFNGDMEPEAGAGAGAAASSAADPAIPf
2    A0A024QZB8    MLWWEEVEDCYEREDVQKKTFTKWVNAQFSKFGKQHIENLFSDLQD
3    A0A024QZP7    MARFGDEMPARYGGGGSGAAAGVVVGSGGGRGAGGSRQGGQPGAQR
4    A0A024QZX5    MRPDRAEAPGPPAMAAGGPGAGSAAPVSSTSSLPLAALNMRVRRRL
5    A0A024QZ33    MNSPGGRGKKKGSGGASNPVPPRPPPPCLAPAPPAAGPAPPPESPH

df2

    Seq_id       number
0   A0A024QZ18     67
1   A0A024QZ33     45
2   A0A024QZ42     252
3   A0A024QZB8     35
4   A0A024QZP7     34
5   A0A024QZX5     54

我想检查dataFrame df1中的哪个条目存在于df2的Se Seq_id中,如果存在,我想将df1中的Sequence打印为类似id的df2 infrot中的新列。如果不存在,请打印“ nan”。

Example answer:

    Seq_id       number   Sequence
0   A0A024QZ18     67     MSGLEMADHMMAMNHGRFPDGTNGLHHHPAHRMGMGQFPSPHHHQQ
1   A0A024QZ33     45     MNSPGGRGKKKGSGGASNPVPPRPPP
2   A0A024QZ42     252    MAALSGGGGGGAEPGQALFNGDMEPEAG
3   A0A024QZB8     35     MLWWEEVEDCYEREDVQKKTFTKWVNAQFSKFGKQHIENLFSDLQD...
4   A0A024QZP7     34     MARFGDEMPARYGGGGSGAAAGVVVGSGG
5   A0A024QZX5     54     MRPDRAEAPGPPAMAAGGPGAGSAAPVSS

我正在尝试按如下方式查看它们是否在该列中

df2.seq_id.isin(df_seq.Entry)

但是如果它们相似,我不知道如何打印另一列,如果不相似,我会给出nan。

1 个答案:

答案 0 :(得分:2)

我认为,简单的左联接将满足您的要求。

df1.merge(df2, how='left', left_on='Entry', right_on='Seq_id')

这将为您提供输出

     Entry                                        Sequence      Seq_id  number
 A0A024QZ18  MSGLEMADHMMAMNHGRFPDGTNGLHHHPAHRMGMGQFPSPHHHQQ  A0A024QZ18      67
 A0A024QZ42   MAALSGGGGGGAEPGQALFNGDMEPEAGAGAGAAASSAADPAIPf  A0A024QZ42     252
 A0A024QZB8  MLWWEEVEDCYEREDVQKKTFTKWVNAQFSKFGKQHIENLFSDLQD  A0A024QZB8      35
 A0A024QZP7  MARFGDEMPARYGGGGSGAAAGVVVGSGGGRGAGGSRQGGQPGAQR  A0A024QZP7      34
 A0A024QZX5  MRPDRAEAPGPPAMAAGGPGAGSAAPVSSTSSLPLAALNMRVRRRL  A0A024QZX5      54
 A0A024QZ33  MNSPGGRGKKKGSGGASNPVPPRPPPPCLAPAPPAAGPAPPPESPH  A0A024QZ33      45