考虑一下我有以下两个数据帧:
StartAsync()
我想要的是基于“开始”和“ Gene_Symbol”列查找这两个数据帧的交集,并且如果它们的“开始”和“ Gene_Symbol”与df2中的行匹配,则仅将它们保留在df1中。 例如,我希望我的结果看起来像这样:
df1:
Composite Beta_value Chromosome Start End Gene_Symbol
0 cg00000029 0.297449111 chr16 53434200 53434201 RBL2
1 cg00000108 0.660066803 chr3 37417715 37417716 C3orf35
2 cg00000109 0.660066803 chr3 172198247 172198248 FNDC3B
3 cg00000165 0.660066803 chr1 90729117 90729118 C3orf35
4 cg00000236 0.905679244 chr8 42405776 42405777 VDAC3
df2:
Composite Beta_value Chromosome Start End Gene_Symbol
2 cg00000109 0.660066803 chr3 172198247 172198248 FNDC3B
3 cg00000165 0.660066803 chr1 90729117 90729118 C3orf35
4 cg00000236 0.905679244 chr8 42405776 42405777 VDAC3
46 cg00002116 0.017114732 chr17 81703380 81703381 MRPL12
47 cg00002145 0.780230816 chr2 237340893 237340894 COL6A3
48 cg00002190 0.781140134 chr8 19697522 19697523 CSGALNACT1
49 cg00002224 0.220786047 chr8 143038982 143038983 C8orf31
通过交集,我并不是要合并数据框并以12列结尾,就像我使用
Composite Beta_value Chromosome Start End Gene_Symbol
2 cg00000109 0.660066803 chr3 172198247 172198248 FNDC3B
3 cg00000165 0.660066803 chr1 90729117 90729118 C3orf35
4 cg00000236 0.905679244 chr8 42405776 42405777 VDAC3
合并了我两个数据框中的列,例如:
intersection = pd.merge(df1, df2, how='inner', on=['Start','Gene_Symbol'])
s1.dropna(inplace=True)
答案 0 :(得分:1)
使用DataFrame.merge
时,请确保选择正确的列,这样,df2
中的所有列也不会被合并:
keys = ['Start', 'Gene_Symbol']
intersection = df1.merge(df2[keys], on=keys)
Composite Beta_value Chromosome Start End Gene_Symbol
0 cg00000109 0.660067 chr3 172198247 172198248 FNDC3B
1 cg00000165 0.660067 chr1 90729117 90729118 C3orf35
2 cg00000236 0.905679 chr8 42405776 42405777 VDAC3
答案 1 :(得分:1)
仅使用df2中的必需列。
pd.merge(df1, df2[['Start','Gene_Symbol']], on=['Start','Gene_Symbol'])