Python不区分大小写查找并替换为相同的发现单词

时间:2015-06-24 09:26:48

标签: python regex

我知道这个问题在此之前已经得到了解答Case insensitive replace但我的有点不同。

我想要的是在文本中搜索某些关键字,并将其替换为<b></b>。通过以下示例解释了四种不同的可能性:

关键字 = ['hell', 'world']

输入句子 = 'Hell is a wonderful place to say hello and sell shells'

预期输出1 = '<b>Hell</b> is a wonderful place to say hello and sell shells' - (未被关键字&#39; hell&#39;但找到的单词&#39取代39;地狱&#39;。只有完整的比赛被替换。

预期输出2 = '<b>Hell</b> is a wonderful place to say <b>hello</b> and sell shells' - (仅替换以关键字开头的匹配单词。请注意,整个单词正在获取即使匹配是部分)也会替换

预期输出3 = '<b>Hell</b> is a wonderful place to say <b>hello</b> and sell <b>shells</b>' - (任何地狱的出现都会被替换,但需要完整的匹配词)< / p>

预期输出4 = '<b>Hell</b> is a wonderful place to say <b>hell</b>o and sell s<b>hell</b>s' - (任何地狱的出现都会被替换,但不会被完整匹配的单词替换。匹配的单词保持不变

链接的SO问题,用找不到我想要的关键字替换单词。我想保持输入句子的大小写完整。有人可以帮我找到上述四种情况的解决方案吗?

我尝试过的代码:

import re
insensitive_hippo = re.compile(re.escape('hell'), re.IGNORECASE)
insensitive_hippo.sub('hell', 'Hell is a wonderful place to say hello and sell shells')
'hell is a wonderful place to say hello and sell shells'

但这并不能保持找到的单词完好无损。

1 个答案:

答案 0 :(得分:2)

print re.sub(r"\b(hell)\b",r"<b>\1</b>",x,flags=re.I)

print re.sub(r"\b(hell\S*)",r"<b>\1</b>",x,flags=re.I)

print re.sub(r"\b(\S*hell\S*)",r"<b>\1</b>",x,flags=re.I)

print re.sub(r"(hell)",r"<b>\1</b>",x,flags=re.I)

输出:

<b>Hell</b> is a wonderful place to say hello and sell shells
<b>Hell</b> is a wonderful place to say <b>hello</b> and sell shells
<b>Hell</b> is a wonderful place to say <b>hello</b> and sell <b>shells</b>
<b>Hell</b> is a wonderful place to say <b>hell</b>o and sell s<b>hell</b>s