我有两个数据帧,假设df1和df2。两个数据框都有相同的列,分别为URL和Age。我想检查df2 ['URL']中的df1 ['URL'],然后将df2 ['Age']替换为匹配行的df1 ['Age'],并将其余行保留在df2中,而无需进行任何更改。
df1
URL Category Age
google.com [IAB19, Technology & Computing] A
youtube.com [IAB25, Non-Standard Content] H
facebook.co [IAB14, Society] A
amazon.com [IAB22, Shopping] M
wpedia.org [IAB5, Education] E
df2
URL Category Age
google.com [IAB19, BBCA] T
youtube.com [IAB25, AACB] T
facebook.co [IAB14, HLGB T
amazon.com [IAB22, ETCL] T
wpedia.org [IAB5, J TCL] T
example1.com [LHTB, 2213] A
example2.com [OPCL, 9909] A
example3.com [PPRS, 7656] A
现在,我要检查df2 ['URL']中是否存在df1 ['URL']中的任何URL,我想用df1 ['Age']替换df2 ['Age'],并保持不常见的状态URLS保持不变。
因此,预期的输出将是:
df3
URL Category Age
google.com [IAB19, BBCA] A
youtube.com [IAB25, AACB] H
facebook.co [IAB14, HLGB A
amazon.com [IAB22, ETCL] M
wpedia.org [IAB5, J TCL] E
example1.com [LHTB, 2213] A
example2.com [OPCL, 9909] A
example3.com [PPRS, 7656] A
答案 0 :(得分:1)
map
+ fillna
: map允许我们替换普通URL的年龄,然后.fillna
恢复不匹配的URL的值。假设URL
是df1
中的唯一键:
df3 = df2.copy()
df3['Age'] = df3.URL.map(df1.set_index('URL').Age).fillna(df3.Age)
# URL Category Age
#0 google.com [IAB19, BBCA] A
#1 youtube.com [IAB25, AACB] H
#2 facebook.co [IAB14, HLGB A
#3 amazon.com [IAB22, ETCL] M
#4 wpedia.org [IAB5, J TCL] E
#5 example1.com [LHTB, 2213] A
#6 example2.com [OPCL, 9909] A
#7 example3.com [PPRS, 7656] A