Question

所以例如我有一个字符串“完美的熊狩猎”，我想在“熊”出现之前用“the”替换这个词。

所以结果字符串将是“熊狩猎”

我以为我会用

re.sub("\w+ bear","the","perfect bear hunts")

但它也取代了“熊”。如何将熊从被替换中排除，同时将其用于匹配？

Answer 1

使用正向前瞻来替换熊之前的所有内容：

re.sub(".+(?=bear )","the ","perfect bear swims")

.+将捕获任何字符（行终止符除外）。

Answer 2

与其他答案一样，我会使用积极的先行断言。

然后，为了解决Rawing在几条评论中提出的问题（如“胡子”这样的话？），我会添加(\b|$)。这匹配字边界或字符串的结尾，因此您只匹配单词bear，而不再匹配。

所以你得到以下内容：

import re

def bear_replace(string):
    return re.sub(r"\w+ (?=bear(\b|$))", "the ", string)

和测试用例（使用pytest）：

import pytest

@pytest.mark.parametrize('string, expected', [
    ("perfect bear swims", "the bear swims"),

    # We only capture the first word before 'bear
    ("before perfect bear swims", "before the bear swims"),

    # 'beard' isn't captured
    ("a perfect beard", "a perfect beard"),

    # We handle the case where 'bear' is the end of the string
    ("perfect bear", "the bear"),

    # 'bear' is followed by a non-space punctuation character
    ("perfect bear-string", "the bear-string"),
])
def test_bear_replace(string, expected):
    assert bear_replace(string) == expected

Answer 3

Look Behind and Look Ahead正则表达式就是你要找的。

re.sub(".+(?=bear)", "the ", "prefect bear swims")

Answer 4

使用前瞻的替代方法：

使用群组()抓取您要保留的部分，然后使用\1重新插入。

re.sub("\w+ (bear)",r"the \1","perfect bear swims")

Python - re.sub而不替换正则表达式的一部分

4 个答案: