Question

我搜索了很多答案，最接近的问题是Compare 2 columns of 2 different pandas dataframes, if the same insert 1 into the other in Python，但这个人特定问题的答案是一个简单的合并，在一般情况下没有回答这个问题。方式。

我有两个大型数据帧，df1（通常约为1000万行）和df2（约1.3亿行）。我需要根据匹配两个df2列的两个df1列，使用三列df2中的值更新三列df1中的值。 df1的顺序必须保持不变，并且只有具有匹配值的行才会更新。

这是数据帧的样子：

df1

chr    snp  x    pos a1 a2
1  1-10020  0  10020  G  A    
1  1-10056  0  10056  C  G    
1  1-10108  0  10108  C  G
1  1-10109  0  10109  C  G    
1  1-10139  0  10139  C  T

请注意，＆＃34; snp＆＃34;的值并非总是如此。是chr-pos，它可以采用许多其他值，没有链接到任何列（如rs1234，indel-6032等）

df2

ID           CHR   STOP  OCHR  OSTOP
rs376643643    1  10040     1  10020
rs373328635    1  10066     1  10056    
rs62651026     1  10208     1  10108    
rs376007522    1  10209     1  10109   
rs368469931    3  30247     1  10139

我需要使用df2 [[＆＃39; ID＆＃39;]更新df1中的[＆＃39; snp＆＃39;，＆＃39; chr＆＃39;，＆＃39; pos＆＃39;]只有当df1 [[＆＃39; chr＆＃39;，＆＃39; pos＆＃39;]]与df2 [[＆＃39]相匹配时，才能使用OCHR＆＃39; OSTOP＆＃39;] ; OCHR＆＃39;，＆＃39; OSTOP＆＃39;]]

所以在这种情况下，更新后，df1看起来像：

chr       snp  x     pos a1 a2    
1  rs376643643  0  10040  G  A    
1  rs373328635  0  10066  C  G    
1  rs62651026   0  10208  C  G    
1  rs376007522  0  10209  C  G    
3  rs368469931  0  30247  C  T

我已将merge用作解决方法：

df1 = pd.merge(df1, df2, how='left', left_on=["chr", "pos"], right_on=["OCHR", "OSTOP"],
                                     left_index=False, right_index=False, sort=False)

然后

df1.loc[~df1.OCHR.isnull(), ["snp", "chr", "pos"]] = df1.loc[~df1.OCHR.isnull(), ["ID", "CHR", "STOP"]].values

然后删除多余的列。

是的，它有效，但通过比较两个数据框的值直接做到这一点是什么方法，我只是不知道如何制定它，我无法在任何地方找到答案;我想对此进行一般性回答可能很有用。

我试过了，但它不起作用：

df1.loc[(df1.chr==df2.OCHR) & (df1.pos==df2.OSTOP),["snp", "chr", "pos"]] = df2.loc[df2[['OCHR', 'OSTOP']] == df1.loc[(df1.chr==df2.OCHR) & (df1.pos==df2.OSTOP),["chr", "pos"]],['ID', ''CHR', 'STOP']].values

谢谢，

的Stephane

Answer 1

您可以使用if (this.sublines) { if (this.sublines.options) { $(this.sublines.options).each(function() { dropdown += '<option value="' + this.subLine + '">' + this.subLine + '</option>'; }); } }功能（需要设置匹配条件以进行索引）。我修改了您的示例数据以允许一些不匹配。

update

Answer 2

Start by renaiming the columns you want to merge in df2

df2.rename(columns={'OCHR':'chr','OSTOP':'pos'},inplace=True)

Now merge on these columns

df_merged = pd.merge(df1, df2, how='inner', on=['chr', 'pos']) # you might have to preserve the df1 index at this stage, not sure

Next, you want to

updater = df_merged[['D','CHR','STOP']] #this will be your update frame
updater.rename( columns={'D':'snp','CHR':'chr','STOP':'pos'},inplace=True) # rename columns to update original

Finally update (see bottom of this link):

df1.update( df1_updater) #updates in place
#  chr          snp  x    pos a1 a2
#0   1  rs376643643  0  10040  G  A
#1   1  rs373328635  0  10066  C  G
#2   1   rs62651026  0  10208  C  G
#3   1  rs376007522  0  10209  C  G
#4   3  rs368469931  0  30247  C  T

update works by matching index/column so you might have to string along the index of df1 for the entire process, then do df1_updater.re_index(... before df1.update(df1_updater)

Python pandas：将多个列替换为与另一个数据帧中的多个列匹配的值

2 个答案: