Question

我有一个带有单个参数的函数，它是一个unicode字符串，包含重音字符。我想在该字符串中找到一个或多个模式并将其打印出来。

我不知道如何正确格式化模式，或者正确使用带有unicode的re.match，或者使用unicode提取match.groups（）。使用ASCII非常容易。哎呀。

Python 2.7

sentence = "These characters, ÄÜ, are special."

def findInSentence(sentence):

    pattern = re.compile("ÄÜ", re.UNICODE)
    return re.match(sentence, pattern).groups()

Answer 1

使用re.search代替re.match。

re.match锚定在字符串的开头，re.search搜索整个字符串。

search和match的语法是：

re.search(pattern, string, flags=0)
re.match(pattern, string, flags=0)

你已经颠倒了模式和字符串。

Answer 2

正确使用Unicode存在许多问题：

声明源文件的编码。
以声明的编码保存文件。
使用Unicode字符串。

此外，正确使用re.search @ M42指出。

搜索模式中也没有groups()，因此请使用.group(0)打印匹配项（如果存在）。

请注意，此实例中不需要re.UNICODE，因为它仅影响特殊匹配序列\w，\W，\b，\B，{的方式{1}}，\d，\D和\s工作且未被使用。

\S

输出：

# coding: utf-8
import re

sentence = u"These characters, ÄÜ, are special."

def findInSentence(sentence):
    pattern = re.compile(u"ÄÜ", re.UNICODE)
    return re.search(pattern, sentence).group(0)

print findInSentence(sentence)

Python 2.7正确的语法来重新匹配Unicode字符串中的重音字符？

2 个答案: