Question

我目前正在使用re2，re和pcre在python中进行正则表达式匹配。当我使用正则表达式，如re.compile（“（？P（\ S *））”）时它很好并且编译时没有错误但是当我使用unicode字符如re.compile时（“（？P＆lt;årsag＆gt; （\ S *））“）然后会出现错误而无法编译。是否有任何python库完全支持unicode。

修改：请参阅我的输出：

>>> import regex
>>> m = regex.compile(r"(?P<årsag>(\S*))")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/regex.py", line 331, in compile
    return _compile(pattern, flags, kwargs)
  File "/usr/local/lib/python2.7/site-packages/regex.py", line 499, in _compile
    caught_exception.pos)
_regex_core.error: bad character in group name at position 10

Answer 1

您需要使用外部正则表达式模块。 regex模块将支持命名捕获组名称中的Unicode字符。

>>> import regex
>>> m = regex.compile(r"(?P<årsag>(\S*))")
>>> m.search('foo').group('årsag')
'foo'
>>> m.search('foo bar').group('årsag')
'foo'

在python中捕获组时正则表达式中的Unicode支持

1 个答案: