Question

目前我使用这个简单的脚本来搜索字符串中的标记;

tag = "#tag"
text = "test string with #tag inserted"
match = re.search(tag, text, re.IGNORECASE) #matches

现在假设文本包含a-acute;

tag = "#tag"
text = "test string with #tág inserted"
match = re.search(tag, text, re.IGNORECASE) #does not match :(

如何使这个匹配工作？也适用于其他特殊角色（é，è，í等等）

提前致谢！

Answer 1

您可以使用unidecode标准化文字：

import unicodedata

tag = "#tag"
text = u"test string with #tág inserted and a #tag"
text=unidecode(text)
re.findall(tag, text, re.IGNORECASE)

<强>出：

['#tag', '#tag']