所以例如我有一个字符串“完美的熊狩猎”,我想在“熊”出现之前用“the”替换这个词。
所以结果字符串将是“熊狩猎”
我以为我会用
re.sub("\w+ bear","the","perfect bear hunts")
但它也取代了“熊”。如何将熊从被替换中排除,同时将其用于匹配?
答案 0 :(得分:2)
使用正向前瞻来替换熊之前的所有内容:
re.sub(".+(?=bear )","the ","perfect bear swims")
.+
将捕获任何字符(行终止符除外)。
答案 1 :(得分:2)
与其他答案一样,我会使用积极的先行断言。
然后,为了解决Rawing在几条评论中提出的问题(如“胡子”这样的话?),我会添加(\b|$)
。这匹配字边界或字符串的结尾,因此您只匹配单词bear
,而不再匹配。
所以你得到以下内容:
import re
def bear_replace(string):
return re.sub(r"\w+ (?=bear(\b|$))", "the ", string)
和测试用例(使用pytest):
import pytest
@pytest.mark.parametrize('string, expected', [
("perfect bear swims", "the bear swims"),
# We only capture the first word before 'bear
("before perfect bear swims", "before the bear swims"),
# 'beard' isn't captured
("a perfect beard", "a perfect beard"),
# We handle the case where 'bear' is the end of the string
("perfect bear", "the bear"),
# 'bear' is followed by a non-space punctuation character
("perfect bear-string", "the bear-string"),
])
def test_bear_replace(string, expected):
assert bear_replace(string) == expected
答案 2 :(得分:1)
Look Behind
and Look Ahead
正则表达式就是你要找的。 p>
re.sub(".+(?=bear)", "the ", "prefect bear swims")
答案 3 :(得分:1)
使用前瞻的替代方法:
使用群组()
抓取您要保留的部分,然后使用\1
重新插入。
re.sub("\w+ (bear)",r"the \1","perfect bear swims")