Question

我有两个与ASCII字符串完全匹配的函数，并使用re模块：

import re

def findWord(w):
    return re.compile(r'\b{0}.*?\b'.format(w), flags=re.IGNORECASE).findall


def replace_keyword(w, c, x):
    return re.sub(r"\b({0}\S*)".format(w), r'<mark style="background-color:{0}">\1</mark>'.format(c), x, flags=re.I)

但是，他们无法使用带有重音字符的utf-8编码字符串。在进一步搜索时，我发现regex模块更适合Unicode字符串，因此我一直尝试将其移植到最近几个小时使用regex，但似乎没有任何工作。这就是我现在所拥有的：

import regex

def findWord(w):
    return regex.compile(r'\b{0}.*?\b'.format(w), flags=regex.IGNORECASE|regex.UNICODE).findall

def replace_keyword(w, c, x):
    return regex.sub(r"\b({0}\S*)".format(w), r'<mark style="background-color:{0}">\1</mark>'.format(c), x, flags=regex.IGNORECASE|regex.UNICODE)

但是，在使用带重音（未规范化）utf-8编码的字符串时，我不断收到ordinal not in range错误。

编辑：建议的可能重复的问题：Regular expression to match non-English characters?并不能解决我的问题。我想使用python re / regex模块。其次，我想让find和replace函数使用python。

编辑：我正在使用python 2

编辑：如果您认为可以帮助我使用Python 3使这两个功能正常工作，请告诉我。我希望我能通过我的python 2脚本调用python 3来使用这两个函数。

Answer 1

我想我会去某个地方。我试图让这个工作不使用模块re或regex，但普通的python：

found_keywords = []
for word in keyword_list:
    if word.lower() in article_text.lower():
         found_keywords.append(word)

for word in found_keywords:  # highlight the found keyword in the text
    article_text = article_text.lower().replace(word.lower(), '<mark style="background-color:%s">%s</mark>' % (yellow_color, word))

现在，我只需要以一种不区分大小写的方式替换找到的关键字，我会很高兴。

请帮助我完成以不区分大小写的方式替换关键字的最后一步，而不使用re或regex，以便它适用于重音字符串。

使用python regex模块处理带重音的Unicode字符

1 个答案: