Question

我有2个Pandas数据框：

df1:

   name    exchange
0  bob     Bobby
1  toon    Looney Tunes
2  donal   Donald Duck


df2:
    strings
0   watching toon
1   love donal
2   nice bobguy

我要实现的是遍历df2行，并检查每个值是否包含df1 ['name']值。如果包含，请在df2中将df1 ['name']替换为df1 ['exchange']。输出应为：

df2:
    strings
0   watching Looney Toons
1   love Donald Duck
2   nice Bobbyguy

到目前为止，我试图做的是：

    for row_index, row in df2.iterrows():
        for row_alias_index, row_alias in df1.iterrows():
            if row_alias['name'] in row['strings']:
                df2.at[row_index, 'strings'] = row['strings'].replace(row_alias['name'], row['exchange'])
                break

我有很多df1行，并且不认为2 for循环是可行的方法。

Answer 1

将Series的{{3}}与regex=True一起使用以替换子内容：

df2['strings'] = df2['strings'].replace(df1.set_index('name')['exchange'], regex=True)
print (df2)
                 strings
0  watching Looney Tunes
1       love Donald Duck
2          nice Bobbyguy

如果还想用正则表达式{{1}的|用OR用Series.replace替换存在的测试行的值，并仅对匹配的行应用解决方案：

s = df1.set_index('name')['exchange']
m = df2['strings'].str.contains('|'.join(s.index))
print (m)
0    True
1    True
2    True
Name: strings, dtype: bool

df2.loc[m, 'strings'] = df2.loc[m, 'strings'].replace(s, regex=True)
print (df2)
                 strings
0  watching Looney Tunes
1       love Donald Duck
2          nice Bobbyguy

如何查找一个熊猫系列的字符串是否在另一个系列中作为子字符串？

1 个答案: