我有2个Pandas数据框:
df1:
name exchange
0 bob Bobby
1 toon Looney Tunes
2 donal Donald Duck
df2:
strings
0 watching toon
1 love donal
2 nice bobguy
我要实现的是遍历df2行,并检查每个值是否包含df1 ['name']值。如果包含,请在df2中将df1 ['name']替换为df1 ['exchange']。 输出应为:
df2:
strings
0 watching Looney Toons
1 love Donald Duck
2 nice Bobbyguy
到目前为止,我试图做的是:
for row_index, row in df2.iterrows():
for row_alias_index, row_alias in df1.iterrows():
if row_alias['name'] in row['strings']:
df2.at[row_index, 'strings'] = row['strings'].replace(row_alias['name'], row['exchange'])
break
我有很多df1行,并且不认为2 for循环是可行的方法。
答案 0 :(得分:1)
将Series
的{{3}}与regex=True
一起使用以替换子内容:
df2['strings'] = df2['strings'].replace(df1.set_index('name')['exchange'], regex=True)
print (df2)
strings
0 watching Looney Tunes
1 love Donald Duck
2 nice Bobbyguy
如果还想用正则表达式{{1}的|
用OR
用Series.replace
替换存在的测试行的值,并仅对匹配的行应用解决方案:
s = df1.set_index('name')['exchange']
m = df2['strings'].str.contains('|'.join(s.index))
print (m)
0 True
1 True
2 True
Name: strings, dtype: bool
df2.loc[m, 'strings'] = df2.loc[m, 'strings'].replace(s, regex=True)
print (df2)
strings
0 watching Looney Tunes
1 love Donald Duck
2 nice Bobbyguy