Python - re.sub而不替换正则表达式的一部分

时间:2017-10-05 14:53:16

标签: python regex

所以例如我有一个字符串“完美的熊狩猎”,我想在“熊”出现之前用“the”替换这个词。

所以结果字符串将是“熊狩猎”

我以为我会用

re.sub("\w+ bear","the","perfect bear hunts")

但它也取代了“熊”。如何将熊从被替换中排除,同时将其用于匹配?

4 个答案:

答案 0 :(得分:2)

使用正向前瞻来替换熊之前的所有内容:

re.sub(".+(?=bear )","the ","perfect bear swims")

.+将捕获任何字符(行终止符除外)。

答案 1 :(得分:2)

与其他答案一样,我会使用积极的先行断言。

然后,为了解决Rawing在几条评论中提出的问题(如“胡子”这样的话?),我会添加(\b|$)。这匹配字边界或字符串的结尾,因此您只匹配单词bear,而不再匹配。

所以你得到以下内容:

import re

def bear_replace(string):
    return re.sub(r"\w+ (?=bear(\b|$))", "the ", string)

和测试用例(使用pytest):

import pytest

@pytest.mark.parametrize('string, expected', [
    ("perfect bear swims", "the bear swims"),

    # We only capture the first word before 'bear
    ("before perfect bear swims", "before the bear swims"),

    # 'beard' isn't captured
    ("a perfect beard", "a perfect beard"),

    # We handle the case where 'bear' is the end of the string
    ("perfect bear", "the bear"),

    # 'bear' is followed by a non-space punctuation character
    ("perfect bear-string", "the bear-string"),
])
def test_bear_replace(string, expected):
    assert bear_replace(string) == expected

答案 2 :(得分:1)

Look Behind and Look Ahead正则表达式就是你要找的。

re.sub(".+(?=bear)", "the ", "prefect bear swims")

答案 3 :(得分:1)

使用前瞻的替代方法:

使用群组()抓取您要保留的部分,然后使用\1重新插入。

re.sub("\w+ (bear)",r"the \1","perfect bear swims")