如何查找一个熊猫系列的字符串是否在另一个系列中作为子字符串?

时间:2020-03-06 07:23:00

标签: python-3.x pandas dataframe

我有2个Pandas数据框:

df1:

   name    exchange
0  bob     Bobby
1  toon    Looney Tunes
2  donal   Donald Duck


df2:
    strings
0   watching toon
1   love donal
2   nice bobguy

我要实现的是遍历df2行,并检查每个值是否包含df1 ['name']值。如果包含,请在df2中将df1 ['name']替换为df1 ['exchange']。 输出应为:

df2:
    strings
0   watching Looney Toons
1   love Donald Duck
2   nice Bobbyguy

到目前为止,我试图做的是:

    for row_index, row in df2.iterrows():
        for row_alias_index, row_alias in df1.iterrows():
            if row_alias['name'] in row['strings']:
                df2.at[row_index, 'strings'] = row['strings'].replace(row_alias['name'], row['exchange'])
                break

我有很多df1行,并且不认为2 for循环是可行的方法。

1 个答案:

答案 0 :(得分:1)

Series的{​​{3}}与regex=True一起使用以替换子内容:

df2['strings'] = df2['strings'].replace(df1.set_index('name')['exchange'], regex=True)
print (df2)
                 strings
0  watching Looney Tunes
1       love Donald Duck
2          nice Bobbyguy

如果还想用正则表达式{{1}的|ORSeries.replace替换存在的测试行的值,并仅对匹配的行应用解决方案:

s = df1.set_index('name')['exchange']
m = df2['strings'].str.contains('|'.join(s.index))
print (m)
0    True
1    True
2    True
Name: strings, dtype: bool

df2.loc[m, 'strings'] = df2.loc[m, 'strings'].replace(s, regex=True)
print (df2)
                 strings
0  watching Looney Tunes
1       love Donald Duck
2          nice Bobbyguy