我有一个数据帧(con
)如下:
tag Consequence
HSB670|ENSG00000147996 upstream_gene_variant
HSB666|ENSG00000147996 upstream_gene_variant
HSB651|ENSG00000174749 downstream_gene_variant
HSB195|ENSG00000188157 splice_variant
第二个数据帧(period
)如下:
Sample expr Gene Period tag
"HSB651" 3.207474 "ENSG00000174749" 4 HSB651|ENSG00000174749
"HSB670" 3.797228 "ENSG00000147996" 4 HSB670|ENSG00000147996
"HSB195" 0.214731 "ENSG00000188157" 4 HSB195|ENSG00000188157
"HSB666" 3.663308 "ENSG00000147996" 5 HSB666|ENSG00000147996
我想将结果信息从con
合并到period
。它们共有tag
列,因此,基本上,无论标签如何相似,我都希望熊猫找到对应的Consequence
并将其添加到period
数据框中。最终应该看起来像这样:
Sample expr Gene Period tag Consequence
"HSB651" 3.207474 "ENSG00000174749" 4 HSB651|ENSG00000174749 downstream_gene_variant
"HSB670" 3.797228 "ENSG00000147996" 4 HSB670|ENSG00000147996 upstream_gene_variant
"HSB195" 0.214731 "ENSG00000188157" 4 HSB195|ENSG00000188157 splice_variant
"HSB666" 3.663308 "ENSG00000147996" 5 HSB666|ENSG00000147996 upstream_gene_variant
我已经尝试过了,但这给了我奇怪的结果:
merge = pd.merge(period, con, on="tag", how="left")
结果:
SampleID expr Gene Period tag Consequence
0 HSB670 3.797228 ENSG00000147996 4 HSB670|ENSG00000147996 NaN
1 HSB666 3.663308 ENSG00000147996 5 HSB666|ENSG00000147996 upstream_gene_variant
2 HSB666 3.663308 ENSG00000147996 5 HSB666|ENSG00000147996 upstream_gene_variant
3 HSB666 3.663308 ENSG00000147996 5 HSB666|ENSG00000147996 upstream_gene_variant
4 HSB666 3.663308 ENSG00000147996 5 HSB666|ENSG00000147996 upstream_gene_variant
5 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749 downstream_gene_variant
6 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749 downstream_gene_variant
7 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749 downstream_gene_variant
8 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749 downstream_gene_variant
9 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749 downstream_gene_variant
10 HSB195 0.214731 ENSG00000188157 4 HSB195|ENSG00000188157 splice_variant
11 HSB195 0.214731 ENSG00000188157 4 HSB195|ENSG00000188157 splice_variant
12 HSB195 0.214731 ENSG00000188157 4 HSB195|ENSG00000188157 splice_variant