用pandas str.replace执行全字子串替换

时间:2019-01-02 15:48:19

标签: python string pandas replace

我有一个示例数据框文本列,其中包含字符串,包括单词“ eng”和单词“ engine”。

ID  Text
1   eng is here
2   engine needs washing
3   eng is overheating 

我想将“ eng”一词替换为“ engine”一词。我使用以下代码:

df['Text'] = df['Text'].str.replace('eng', 'engine')

但是这弄乱了我第二行的文字。第二行变为

ID  Text
2   engineine needs washing

有没有办法做替换词,以便仅在整个单词只说“ eng”时才替换呢?

3 个答案:

答案 0 :(得分:3)

用单词边界字符df['Text'].str.replace(r'\beng\b', 'engine') 0 engine is here 1 engine needs washing 2 engine is overheating Name: Text, dtype: object 包裹您的关键字:

replace

如果您要用这种方式替换多个关键字,请使用regex=True开关将字典传递到repl = {'eng' : 'engine'} repl = {rf'\b{k}\b': v for k, v in repl.items()} df['Text'].replace(repl, regex=True) 0 engine is here 1 engine needs washing 2 engine is overheating Name: Text, dtype: object

DDMFormInstance

答案 1 :(得分:2)

添加空白并通过您自己的代码解决了该问题

df['Text'].str.replace('eng ', 'engine ')
Out[736]: 
0            engine is here
1      engine needs washing
2    engine is overheating 
Name: Text, dtype: object

更新

df.Text.str.split(' ',expand=True).replace('eng','engine').fillna('').apply(' '.join,1)
Out[752]: 
0           engine is here 
1     engine needs washing 
2    engine is overheating 
dtype: object

答案 2 :(得分:1)

您可以尝试像这样的正则表达式:

import re
df['Text'] = df['Text'].map(lambda x: re.sub(r'\beng\b', 'engine', x))

此给定正则表达式中的\ b标记与“ wordboundaries”匹配,因此“ eng”将被强制用空格包围。