Question

我有2个数据框，如下所示

df1

PMID       References
12755609    2755610
2844048     987

df2

 PMID       Authors
 2844048    CohKBJKenUP
 2844048    Markar AB
 12755609    GuarnerUJ
 12755609    RoshanRJ
 2755610     John HV
 2755610     Tony KR
 987         Maroi KK

我想将df1（PMID和参考）中的列与df2的PMID进行比较，如果存在匹配的值，则df（PMID和参考）列中的值将由作者替换。 df1中的每个PMID或引用在df2中都可以有多个作者，例如2844048有2个两个作者（CohKBJKenUP，Markar AB），因此结果应包含所有可能的组合

可能的输出就像

PMID       References
GuarnerUJ   John HV
RoshanRJ    John HV
GuarnerUJ   Tony KR
RoshanRJ    Tony KR
CohKBJKenUP Maroi KK
Markar AB   Maroi KK

我尝试过使用for循环，但是由于文件大而导致内存问题。

我也尝试过此操作，但是这给出了错误

Reindexing only valid with uniquely valued Index objects

df1['PMID'] = df1['PMID'].map(df2.set_index('PMID')['author'])

请提出获得预期结果的可能性。

Answer 1

这对您有用吗？

df = df1.merge(df2, on='PMID')
df.drop(['PMID'],axis=1,inplace=True)
df = df.merge(df2,left_on='References',right_on='PMID')
df.drop(['References','PMID'],axis=1,inplace=True)
df.columns = ['PMID','References']
print(df)


    PMID        References
0   GuarnerUJ   John HV
1   GuarnerUJ   Tony KR
2   RoshanRJ    John HV
3   RoshanRJ    Tony KR
4   CohKBJKenUP Maroi KK
5   Markar AB   Maroi KK

在2个数据框中交换公共值

1 个答案: