根据匹配列将信息从一个数据框合并到另一个数据框

时间:2018-11-06 21:49:24

标签: python pandas

我有一个数据帧(con)如下:

tag                         Consequence
HSB670|ENSG00000147996      upstream_gene_variant
HSB666|ENSG00000147996      upstream_gene_variant
HSB651|ENSG00000174749      downstream_gene_variant
HSB195|ENSG00000188157      splice_variant

第二个数据帧(period)如下:

Sample      expr        Gene                Period  tag
"HSB651"    3.207474    "ENSG00000174749"   4       HSB651|ENSG00000174749
"HSB670"    3.797228    "ENSG00000147996"   4       HSB670|ENSG00000147996
"HSB195"    0.214731    "ENSG00000188157"   4       HSB195|ENSG00000188157 
"HSB666"    3.663308    "ENSG00000147996"   5       HSB666|ENSG00000147996

我想将结果信息从con合并到period。它们共有tag列,因此,基本上,无论标签如何相似,我都希望熊猫找到对应的Consequence并将其添加到period数据框中。最终应该看起来像这样:

Sample      expr        Gene                Period  tag                     Consequence 
"HSB651"    3.207474    "ENSG00000174749"   4       HSB651|ENSG00000174749  downstream_gene_variant
"HSB670"    3.797228    "ENSG00000147996"   4       HSB670|ENSG00000147996  upstream_gene_variant
"HSB195"    0.214731    "ENSG00000188157"   4       HSB195|ENSG00000188157  splice_variant
"HSB666"    3.663308    "ENSG00000147996"   5       HSB666|ENSG00000147996  upstream_gene_variant

我已经尝试过了,但这给了我奇怪的结果:

merge = pd.merge(period, con, on="tag", how="left")

结果:

   SampleID      expr             Gene  Period                     tag      Consequence  
0    HSB670  3.797228  ENSG00000147996       4  HSB670|ENSG00000147996      NaN  
1    HSB666  3.663308  ENSG00000147996       5  HSB666|ENSG00000147996      upstream_gene_variant   
2    HSB666  3.663308  ENSG00000147996       5  HSB666|ENSG00000147996      upstream_gene_variant   
3    HSB666  3.663308  ENSG00000147996       5  HSB666|ENSG00000147996      upstream_gene_variant   
4    HSB666  3.663308  ENSG00000147996       5  HSB666|ENSG00000147996      upstream_gene_variant   
5    HSB651  3.207474  ENSG00000174749       4  HSB651|ENSG00000174749      downstream_gene_variant       
6    HSB651  3.207474  ENSG00000174749       4  HSB651|ENSG00000174749      downstream_gene_variant   
7    HSB651  3.207474  ENSG00000174749       4  HSB651|ENSG00000174749      downstream_gene_variant   
8    HSB651  3.207474  ENSG00000174749       4  HSB651|ENSG00000174749      downstream_gene_variant   
9    HSB651  3.207474  ENSG00000174749       4  HSB651|ENSG00000174749      downstream_gene_variant   
10   HSB195  0.214731  ENSG00000188157       4  HSB195|ENSG00000188157      splice_variant   
11   HSB195  0.214731  ENSG00000188157       4  HSB195|ENSG00000188157      splice_variant   
12   HSB195  0.214731  ENSG00000188157       4  HSB195|ENSG00000188157      splice_variant 

0 个答案:

没有答案