我有一个示例数据框文本列,其中包含字符串,包括单词“ eng”和单词“ engine”。
ID Text
1 eng is here
2 engine needs washing
3 eng is overheating
我想将“ eng”一词替换为“ engine”一词。我使用以下代码:
df['Text'] = df['Text'].str.replace('eng', 'engine')
但是这弄乱了我第二行的文字。第二行变为
ID Text
2 engineine needs washing
有没有办法做替换词,以便仅在整个单词只说“ eng”时才替换呢?
答案 0 :(得分:3)
用单词边界字符df['Text'].str.replace(r'\beng\b', 'engine')
0 engine is here
1 engine needs washing
2 engine is overheating
Name: Text, dtype: object
包裹您的关键字:
replace
如果您要用这种方式替换多个关键字,请使用regex=True
开关将字典传递到repl = {'eng' : 'engine'}
repl = {rf'\b{k}\b': v for k, v in repl.items()}
df['Text'].replace(repl, regex=True)
0 engine is here
1 engine needs washing
2 engine is overheating
Name: Text, dtype: object
:
DDMFormInstance
答案 1 :(得分:2)
添加空白并通过您自己的代码解决了该问题
df['Text'].str.replace('eng ', 'engine ')
Out[736]:
0 engine is here
1 engine needs washing
2 engine is overheating
Name: Text, dtype: object
更新
df.Text.str.split(' ',expand=True).replace('eng','engine').fillna('').apply(' '.join,1)
Out[752]:
0 engine is here
1 engine needs washing
2 engine is overheating
dtype: object
答案 2 :(得分:1)
您可以尝试像这样的正则表达式:
import re
df['Text'] = df['Text'].map(lambda x: re.sub(r'\beng\b', 'engine', x))
此给定正则表达式中的\ b标记与“ wordboundaries”匹配,因此“ eng”将被强制用空格包围。