Question

我从用户那里获得一个输入正则表达式，该表达式保存为unicode字符串。在将输入字符串作为正则表达式对象进行压缩之前，是否必须将输入字符串转换为原始字符串？还是没必要？我是否正确地将其转换为原始字符串？

import re
input_regex_as_unicode = u"^(.){1,36}$"
string_to_check = "342342dedsfs"

# leave as unicode
compiled_regex = re.compile(input_regex_as_unicode)
match_string = re.match(compiled_regex, string_to_check)

# convert to raw
compiled_regex = re.compile(r'' + input_regex_as_unicode)
match_string = re.match(compiled_regex, string_to_check)

@Ahsanul Haque，我的问题是更具规则性的表达式，正则表达式在将其转换为正则表达式对象时是否正确处理unicode字符串

Answer 1

re模块正确处理unicode字符串和普通字符串，您不需要将它们转换为任何字符串（但在使用字符串时应该保持一致）。

没有像“原始字符串”这样的东西。如果它可以帮助您使用包含反斜杠的字符串，则可以在代码中使用原始字符串表示法。例如，要匹配换行符，您可以使用'\\n'，u'\\n'，r'\n'或ur'\n'。

由于r''和''评估为相同的字符串，因此在示例中使用原始字符串表示法不执行任何操作。

如何在python正则表达式中正确使用unicode字符串

1 个答案: