检查两个数据框之间的公共行,并用df1

时间:2019-03-24 21:20:01

标签: python pandas

我有两个数据帧,假设df1和df2。两个数据框都有相同的列,分别为URL和Age。我想检查df2 ['URL']中的df1 ['URL'],然后将df2 ['Age']替换为匹配行的df1 ['Age'],并将其余行保留在df2中,而无需进行任何更改。

df1

URL          Category                         Age     
google.com  [IAB19, Technology & Computing]   A
youtube.com [IAB25, Non-Standard Content]     H
facebook.co [IAB14, Society]                  A
amazon.com  [IAB22, Shopping]                 M
wpedia.org  [IAB5, Education]                 E

df2

URL          Category         Age     
google.com  [IAB19, BBCA]     T
youtube.com [IAB25, AACB]     T
facebook.co [IAB14, HLGB      T 
amazon.com  [IAB22, ETCL]     T
wpedia.org  [IAB5, J TCL]     T
example1.com [LHTB, 2213]     A
example2.com [OPCL, 9909]     A
example3.com [PPRS, 7656]     A

现在,我要检查df2 ['URL']中是否存在df1 ['URL']中的任何URL,我想用df1 ['Age']替换df2 ['Age'],并保持不常见的状态URLS保持不变。

因此,预期的输出将是:

df3

URL          Category         Age     
google.com  [IAB19, BBCA]     A
youtube.com [IAB25, AACB]     H
facebook.co [IAB14, HLGB      A
amazon.com  [IAB22, ETCL]     M
wpedia.org  [IAB5, J TCL]     E
example1.com [LHTB, 2213]     A
example2.com [OPCL, 9909]     A
example3.com [PPRS, 7656]     A

1 个答案:

答案 0 :(得分:1)

map + fillna

map允许我们替换普通URL的年龄,然后.fillna恢复不匹配的URL的值。假设URLdf1中的唯一键:

df3 = df2.copy()
df3['Age'] = df3.URL.map(df1.set_index('URL').Age).fillna(df3.Age)

#            URL            Category Age
#0    google.com  [IAB19, BBCA]        A
#1   youtube.com  [IAB25, AACB]        H
#2   facebook.co  [IAB14, HLGB         A
#3    amazon.com  [IAB22, ETCL]        M
#4    wpedia.org  [IAB5, J TCL]        E
#5  example1.com   [LHTB, 2213]        A
#6  example2.com   [OPCL, 9909]        A
#7  example3.com   [PPRS, 7656]        A