Question

使用Python 3，如下所示的简单脚本应按预期运行，但似乎会阻塞unicode表情字符串：

import re

phrase = "(╯°□°)╯ ︵ ┻━┻"
pattern = r'\b{0}\b'.format(phrase)

text = "The quick brown fox got tired of jumping over dogs and flipped a table: (╯°□°)╯ ︵ ┻━┻"

if re.search(pattern, text, re.IGNORECASE) != None:
    print("Matched!")

如果我替换“＃34; fox＆＃34;对于短语变量的内容，模式确实匹配。我一直很困惑为什么它不喜欢这个特殊的字符串，而我对手册和Stack Overflow的探索并没有解决这个问题。从我所知，Python 3应该毫无问题地处理这个问题。

我错过了一些非常明显的东西吗？

编辑：此外，删除边界（\ b）也不会影响匹配字符串的能力。

Answer 1

(╯°□°)╯ ︵ ┻━┻

此表达式中包含括号，您需要将它们转义。否则它们被解释为组。

In [24]: re.search(r'\(╯°□°\)╯ ︵ ┻━┻', text, re.IGNORECASE)
Out[24]: <_sre.SRE_Match object; span=(72, 85), match='(╯°□°)╯ ︵ ┻━┻'>

In [25]: re.findall(r'\(╯°□°\)╯ ︵ ┻━┻', text, re.IGNORECASE)
Out[25]: ['(╯°□°)╯ ︵ ┻━┻']

Escape the regex string正确并将您的代码更改为：

import re

phrase = "(╯°□°)╯ ︵ ┻━┻"
pattern = re.escape(phrase)

text = "The quick brown fox got tired of jumping over dogs and flipped a table: (╯°□°)╯ ︵ ┻━┻"

if re.search(pattern, text, re.IGNORECASE) != None:
    print("Matched!")

然后它将按预期工作：

$ python3 a.py
Matched!

Python 3正则表达式和Unicode Emotes

1 个答案: