在Python 3.6中,我使用此正则表达式模式删除了表情符号:
emoji_pattern = re.compile(u"["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
"]", flags= re.UNICODE)
它工作正常但是一个表情符号仍在这里,其代码是 U + 26a1 (https://emojipedia.org/emoji/%E2%9A%A1/)。 因为它看起来不像上面的代码而 \ U \ + 26a1 在regex101中不起作用,所以我想知道如何将它放在模式中,谢谢!
编辑: 如上所述,这是一个例子(表情符号不会在这里显示)
mot = '⚡'
mot = emoji_pattern.sub('', mot)
print(mot)
我会看到表情符号
答案 0 :(得分:3)
它位于另一个unicode块中,未被unicode归类为表情符号。
"\u2600-\u26FF" # Unicode Block 'Miscellaneous Symbols'
在python字符串文字中,转义序列\Uxxxxxxxx
正好需要八个十六进制数字(32位)。小写\uxxxx
需要四位数(16位)。
当前版本的Python支持字符串文字中的unicode,因此您只需使用正则表达式中的实际字符。
>>> re.sub('⚡', ':zap:', 'AC⚡DC')
'AC:zap:DC'
答案 1 :(得分:2)
表情符号的正则表达式是一个特别难的问题,特别是如果你关心误报。您的具体问题:Python中的\U
表示法需要8个以下十六进制数字,因此请用零填充或使用\u
表示法\u26A1
。
为什么很难匹配所有的表情符号?因为它们遍布所有Unicode而几乎没有识别它们。表情符号被定义为存在于list at Unicode。
完全匹配的JS版本是on Github。但是,它不容易转换为Python,因为Javascript represents non-BMP characters differently比Python。
答案 2 :(得分:2)
如果你需要V11的完整32位表情符号正则表达式,那就是它 您可以使用此工具生成最新版本 here 在Windows上运行。
高压标志是一个表情符号,也与此相匹配。
正则表达式:
# Use the 'Mega-Conversion' tool to change into other syntaxes
# -------------------------------------------------------------
[#*0-9] \uFE0F \u20E3
| [\u00A9\u00AE\u203C\u2049\u2122\u2139\u2194-\u2199\u21A9\u21AA\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA\u24C2\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE\u2600-\u2604\u260E\u2611\u2614\u2615\u2618]
| \u261D [\U0001F3FB-\U0001F3FF]?
| [\u2620\u2622\u2623\u2626\u262A\u262E\u262F\u2638-\u263A\u2640\u2642\u2648-\u2653\u265F\u2660\u2663\u2665\u2666\u2668\u267B\u267E\u267F\u2692-\u2697\u2699\u269B\u269C\u26A0\u26A1\u26AA\u26AB\u26B0\u26B1\u26BD\u26BE\u26C4\u26C5\u26C8\u26CE\u26CF\u26D1\u26D3\u26D4\u26E9\u26EA\u26F0-\u26F5\u26F7\u26F8]
| \u26F9
(?:
\uFE0F \u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| [\u26FA\u26FD\u2702\u2705\u2708\u2709]
| [\u270A-\u270D] [\U0001F3FB-\U0001F3FF]?
| [\u270F\u2712\u2714\u2716\u271D\u2721\u2728\u2733\u2734\u2744\u2747\u274C\u274E\u2753-\u2755\u2757\u2763\u2764\u2795-\u2797\u27A1\u27B0\u27BF\u2934\u2935\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55\u3030\u303D\u3297\u3299\U0001F004\U0001F0CF\U0001F170\U0001F171\U0001F17E\U0001F17F\U0001F18E\U0001F191-\U0001F19A]
| \U0001F1E6 [\U0001F1E8-\U0001F1EC\U0001F1EE\U0001F1F1\U0001F1F2\U0001F1F4\U0001F1F6-\U0001F1FA\U0001F1FC\U0001F1FD\U0001F1FF]
| \U0001F1E7 [\U0001F1E6\U0001F1E7\U0001F1E9-\U0001F1EF\U0001F1F1-\U0001F1F4\U0001F1F6-\U0001F1F9\U0001F1FB\U0001F1FC\U0001F1FE\U0001F1FF]
| \U0001F1E8 [\U0001F1E6\U0001F1E8\U0001F1E9\U0001F1EB-\U0001F1EE\U0001F1F0-\U0001F1F5\U0001F1F7\U0001F1FA-\U0001F1FF]
| \U0001F1E9 [\U0001F1EA\U0001F1EC\U0001F1EF\U0001F1F0\U0001F1F2\U0001F1F4\U0001F1FF]
| \U0001F1EA [\U0001F1E6\U0001F1E8\U0001F1EA\U0001F1EC\U0001F1ED\U0001F1F7-\U0001F1FA]
| \U0001F1EB [\U0001F1EE-\U0001F1F0\U0001F1F2\U0001F1F4\U0001F1F7]
| \U0001F1EC [\U0001F1E6\U0001F1E7\U0001F1E9-\U0001F1EE\U0001F1F1-\U0001F1F3\U0001F1F5-\U0001F1FA\U0001F1FC\U0001F1FE]
| \U0001F1ED [\U0001F1F0\U0001F1F2\U0001F1F3\U0001F1F7\U0001F1F9\U0001F1FA]
| \U0001F1EE [\U0001F1E8-\U0001F1EA\U0001F1F1-\U0001F1F4\U0001F1F6-\U0001F1F9]
| \U0001F1EF [\U0001F1EA\U0001F1F2\U0001F1F4\U0001F1F5]
| \U0001F1F0 [\U0001F1EA\U0001F1EC-\U0001F1EE\U0001F1F2\U0001F1F3\U0001F1F5\U0001F1F7\U0001F1FC\U0001F1FE\U0001F1FF]
| \U0001F1F1 [\U0001F1E6-\U0001F1E8\U0001F1EE\U0001F1F0\U0001F1F7-\U0001F1FB\U0001F1FE]
| \U0001F1F2 [\U0001F1E6\U0001F1E8-\U0001F1ED\U0001F1F0-\U0001F1FF]
| \U0001F1F3 [\U0001F1E6\U0001F1E8\U0001F1EA-\U0001F1EC\U0001F1EE\U0001F1F1\U0001F1F4\U0001F1F5\U0001F1F7\U0001F1FA\U0001F1FF]
| \U0001F1F4 \U0001F1F2
| \U0001F1F5 [\U0001F1E6\U0001F1EA-\U0001F1ED\U0001F1F0-\U0001F1F3\U0001F1F7-\U0001F1F9\U0001F1FC\U0001F1FE]
| \U0001F1F6 \U0001F1E6
| \U0001F1F7 [\U0001F1EA\U0001F1F4\U0001F1F8\U0001F1FA\U0001F1FC]
| \U0001F1F8 [\U0001F1E6-\U0001F1EA\U0001F1EC-\U0001F1F4\U0001F1F7-\U0001F1F9\U0001F1FB\U0001F1FD-\U0001F1FF]
| \U0001F1F9 [\U0001F1E6\U0001F1E8\U0001F1E9\U0001F1EB-\U0001F1ED\U0001F1EF-\U0001F1F4\U0001F1F7\U0001F1F9\U0001F1FB\U0001F1FC\U0001F1FF]
| \U0001F1FA [\U0001F1E6\U0001F1EC\U0001F1F2\U0001F1F3\U0001F1F8\U0001F1FE\U0001F1FF]
| \U0001F1FB [\U0001F1E6\U0001F1E8\U0001F1EA\U0001F1EC\U0001F1EE\U0001F1F3\U0001F1FA]
| \U0001F1FC [\U0001F1EB\U0001F1F8]
| \U0001F1FD \U0001F1F0
| \U0001F1FE [\U0001F1EA\U0001F1F9]
| \U0001F1FF [\U0001F1E6\U0001F1F2\U0001F1FC]
| [\U0001F201\U0001F202\U0001F21A\U0001F22F\U0001F232-\U0001F23A\U0001F250\U0001F251\U0001F300-\U0001F321\U0001F324-\U0001F384]
| \U0001F385 [\U0001F3FB-\U0001F3FF]?
| [\U0001F386-\U0001F393\U0001F396\U0001F397\U0001F399-\U0001F39B\U0001F39E-\U0001F3C1]
| \U0001F3C2 [\U0001F3FB-\U0001F3FF]?
| [\U0001F3C3\U0001F3C4]
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| [\U0001F3C5\U0001F3C6]
| \U0001F3C7 [\U0001F3FB-\U0001F3FF]?
| [\U0001F3C8\U0001F3C9]
| \U0001F3CA
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| [\U0001F3CB\U0001F3CC]
(?:
\uFE0F \u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| [\U0001F3CD-\U0001F3F0]
| \U0001F3F3
(?: \uFE0F \u200D \U0001F308 )?
| \U0001F3F4
(?:
\u200D \u2620 \uFE0F
| \U000E0067 \U000E0062
(?:
\U000E0065 \U000E006E \U000E0067
| \U000E0073 \U000E0063 \U000E0074
| \U000E0077 \U000E006C \U000E0073
)
\U000E007F
)?
| [\U0001F3F5\U0001F3F7-\U0001F440]
| \U0001F441
(?: \uFE0F \u200D \U0001F5E8 \uFE0F )?
| [\U0001F442\U0001F443] [\U0001F3FB-\U0001F3FF]?
| [\U0001F444\U0001F445]
| [\U0001F446-\U0001F450] [\U0001F3FB-\U0001F3FF]?
| [\U0001F451-\U0001F465]
| [\U0001F466\U0001F467] [\U0001F3FB-\U0001F3FF]?
| \U0001F468
(?:
\u200D
(?:
[\u2695\u2696\u2708] \uFE0F
| \u2764 \uFE0F \u200D
(?: \U0001F48B \u200D )?
\U0001F468
| [\U0001F33E\U0001F373\U0001F393\U0001F3A4\U0001F3A8\U0001F3EB\U0001F3ED]
| \U0001F466
(?: \u200D \U0001F466 )?
| \U0001F467
(?: \u200D [\U0001F466\U0001F467] )?
| [\U0001F468\U0001F469] \u200D
(?:
\U0001F466
(?: \u200D \U0001F466 )?
| \U0001F467
(?: \u200D [\U0001F466\U0001F467] )?
)
| [\U0001F4BB\U0001F4BC\U0001F527\U0001F52C\U0001F680\U0001F692\U0001F9B0-\U0001F9B3]
)
| [\U0001F3FB-\U0001F3FF]
(?:
\u200D
(?:
[\u2695\u2696\u2708] \uFE0F
| [\U0001F33E\U0001F373\U0001F393\U0001F3A4\U0001F3A8\U0001F3EB\U0001F3ED\U0001F4BB\U0001F4BC\U0001F527\U0001F52C\U0001F680\U0001F692\U0001F9B0-\U0001F9B3]
)
)?
)?
| \U0001F469
(?:
\u200D
(?:
[\u2695\u2696\u2708] \uFE0F
| \u2764 \uFE0F \u200D
(?: \U0001F48B \u200D )?
[\U0001F468\U0001F469]
| [\U0001F33E\U0001F373\U0001F393\U0001F3A4\U0001F3A8\U0001F3EB\U0001F3ED]
| \U0001F466
(?: \u200D \U0001F466 )?
| \U0001F467
(?: \u200D [\U0001F466\U0001F467] )?
| \U0001F469 \u200D
(?:
\U0001F466
(?: \u200D \U0001F466 )?
| \U0001F467
(?: \u200D [\U0001F466\U0001F467] )?
)
| [\U0001F4BB\U0001F4BC\U0001F527\U0001F52C\U0001F680\U0001F692\U0001F9B0-\U0001F9B3]
)
| [\U0001F3FB-\U0001F3FF]
(?:
\u200D
(?:
[\u2695\u2696\u2708] \uFE0F
| [\U0001F33E\U0001F373\U0001F393\U0001F3A4\U0001F3A8\U0001F3EB\U0001F3ED\U0001F4BB\U0001F4BC\U0001F527\U0001F52C\U0001F680\U0001F692\U0001F9B0-\U0001F9B3]
)
)?
)?
| [\U0001F46A-\U0001F46D]
| \U0001F46E
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| \U0001F46F
(?: \u200D [\u2640\u2642] \uFE0F )?
| \U0001F470 [\U0001F3FB-\U0001F3FF]?
| \U0001F471
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| \U0001F472 [\U0001F3FB-\U0001F3FF]?
| \U0001F473
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| [\U0001F474-\U0001F476] [\U0001F3FB-\U0001F3FF]?
| \U0001F477
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| \U0001F478 [\U0001F3FB-\U0001F3FF]?
| [\U0001F479-\U0001F47B]
| \U0001F47C [\U0001F3FB-\U0001F3FF]?
| [\U0001F47D-\U0001F480]
| [\U0001F481\U0001F482]
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| \U0001F483 [\U0001F3FB-\U0001F3FF]?
| \U0001F484
| \U0001F485 [\U0001F3FB-\U0001F3FF]?
| [\U0001F486\U0001F487]
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| [\U0001F488-\U0001F4A9]
| \U0001F4AA [\U0001F3FB-\U0001F3FF]?
| [\U0001F4AB-\U0001F4FD\U0001F4FF-\U0001F53D\U0001F549-\U0001F54E\U0001F550-\U0001F567\U0001F56F\U0001F570\U0001F573]
| \U0001F574 [\U0001F3FB-\U0001F3FF]?
| \U0001F575
(?:
\uFE0F \u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| [\U0001F576-\U0001F579]
| \U0001F57A [\U0001F3FB-\U0001F3FF]?
| [\U0001F587\U0001F58A-\U0001F58D]
| [\U0001F590\U0001F595\U0001F596] [\U0001F3FB-\U0001F3FF]?
| [\U0001F5A4\U0001F5A5\U0001F5A8\U0001F5B1\U0001F5B2\U0001F5BC\U0001F5C2-\U0001F5C4\U0001F5D1-\U0001F5D3\U0001F5DC-\U0001F5DE\U0001F5E1\U0001F5E3\U0001F5E8\U0001F5EF\U0001F5F3\U0001F5FA-\U0001F644]
| [\U0001F645-\U0001F647]
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| [\U0001F648-\U0001F64A]
| \U0001F64B
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| \U0001F64C [\U0001F3FB-\U0001F3FF]?
| [\U0001F64D\U0001F64E]
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| \U0001F64F [\U0001F3FB-\U0001F3FF]?
| [\U0001F680-\U0001F6A2]
| \U0001F6A3
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| [\U0001F6A4-\U0001F6B3]
| [\U0001F6B4-\U0001F6B6]
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| [\U0001F6B7-\U0001F6BF]
| \U0001F6C0 [\U0001F3FB-\U0001F3FF]?
| [\U0001F6C1-\U0001F6C5\U0001F6CB]
| \U0001F6CC [\U0001F3FB-\U0001F3FF]?
| [\U0001F6CD-\U0001F6D2\U0001F6E0-\U0001F6E5\U0001F6E9\U0001F6EB\U0001F6EC\U0001F6F0\U0001F6F3-\U0001F6F9\U0001F910-\U0001F917]
| [\U0001F918-\U0001F91C] [\U0001F3FB-\U0001F3FF]?
| \U0001F91D
| [\U0001F91E\U0001F91F] [\U0001F3FB-\U0001F3FF]?
| [\U0001F920-\U0001F925]
| \U0001F926
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| [\U0001F927-\U0001F92F]
| [\U0001F930-\U0001F936] [\U0001F3FB-\U0001F3FF]?
| \U0001F937
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| [\U0001F938\U0001F939]
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| \U0001F93A
| \U0001F93C
(?: \u200D [\u2640\u2642] \uFE0F )?
| [\U0001F93D\U0001F93E]
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| [\U0001F940-\U0001F945\U0001F947-\U0001F970\U0001F973-\U0001F976\U0001F97A\U0001F97C-\U0001F9A2\U0001F9B0-\U0001F9B4]
| [\U0001F9B5\U0001F9B6] [\U0001F3FB-\U0001F3FF]?
| \U0001F9B7
| [\U0001F9B8\U0001F9B9]
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| [\U0001F9C0-\U0001F9C2\U0001F9D0]
| [\U0001F9D1-\U0001F9D5] [\U0001F3FB-\U0001F3FF]?
| \U0001F9D6
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| [\U0001F9D7-\U0001F9DD]
(?:
\u200D [\u2640\u2642] \uFE0F
| [\U0001F3FB-\U0001F3FF]
(?: \u200D [\u2640\u2642] \uFE0F )?
)?
| [\U0001F9DE\U0001F9DF]
(?: \u200D [\u2640\u2642] \uFE0F )?
| [\U0001F9E0-\U0001F9FF]
Unicode字符版本:
[#*0-9]️⃣|[©®‼⁉™ℹ↔-↙↩↪⌚⌛⌨⏏⏩-⏳⏸-⏺Ⓜ▪▫▶◀◻-◾☀-☄☎☑☔☕☘]|☝[-]?|[☠☢☣☦☪☮☯☸-☺♀♂♈-♓♟♠♣♥♦♨♻♾♿⚒-⚗⚙⚛⚜⚠⚡⚪⚫⚰⚱⚽⚾⛄⛅⛈⛎⛏⛑⛓⛔⛩⛪⛰-⛵⛷⛸]|⛹(?:️[♀♂]️|[-](?:[♀♂]️)?)?|[⛺⛽✂✅✈✉]|[✊-✍][-]?|[✏✒✔✖✝✡✨✳✴❄❇❌❎❓-❕❗❣❤➕-➗➡➰➿⤴⤵⬅-⬇⬛⬜⭐⭕〰〽㊗㊙-]|[--]|[---]|[---]|[]|[-]|[-]|[---]|[]|[---]|[]|[-]|[--]|[--]|[-]||[---]||[]|[----]|[--]|[]|[]|[]||[]|[]|[---]|[-]?|[---]|[-]?|[](?:[♀♂]️|[-](?:[♀♂]️)?)?|[]|[-]?|[]|(?:[♀♂]️|[-](?:[♀♂]️)?)?|[](?:️[♀♂]️|[-](?:[♀♂]️)?)?|[-]|(?:️)?|(?:☠️|(?:||))?|[-]|(?:️️)?|[][-]?|[]|[-][-]?|[-]|[][-]?|(?:(?:[⚕⚖✈]️|❤️(?:)?|[]|(?:)?|(?:[])?|[](?:(?:)?|(?:[])?)|[-])|[-](?:(?:[⚕⚖✈]️|[-]))?)?|(?:(?:[⚕⚖✈]️|❤️(?:)?[]|[]|(?:)?|(?:[])?|(?:(?:)?|(?:[])?)|[-])|[-](?:(?:[⚕⚖✈]️|[-]))?)?|[-]|(?:[♀♂]️|[-](?:[♀♂]️)?)?|(?:[♀♂]️)?|[-]?|(?:[♀♂]️|[-](?:[♀♂]️)?)?|[-]?|(?:[♀♂]️|[-](?:[♀♂]️)?)?|[-][-]?|(?:[♀♂]️|[-](?:[♀♂]️)?)?|[-]?|[-]|[-]?|[-]|[](?:[♀♂]️|[-](?:[♀♂]️)?)?|[-]?||[-]?|[](?:[♀♂]️|[-](?:[♀♂]️)?)?|[-]|[-]?|[----]|[-]?|(?:️[♀♂]️|[-](?:[♀♂]️)?)?|[-]|[-]?|[-]|[][-]?|[----]|[-](?:[♀♂]️|[-](?:[♀♂]️)?)?|[-]|(?:[♀♂]️|[-](?:[♀♂]️)?)?|[-]?|[](?:[♀♂]️|[-](?:[♀♂]️)?)?|[-]?|[-]|(?:[♀♂]️|[-](?:[♀♂]️)?)?|[-]|[-](?:[♀♂]️|[-](?:[♀♂]️)?)?|[-]|[-]?|[-]|[-]?|[----]|[-][-]?||[][-]?|[-]|(?:[♀♂]️|[-](?:[♀♂]️)?)?|[-]|[-][-]?|(?:[♀♂]️|[-](?:[♀♂]️)?)?|[](?:[♀♂]️|[-](?:[♀♂]️)?)?||(?:[♀♂]️)?|[](?:[♀♂]️|[-](?:[♀♂]️)?)?|[-----]|[][-]?||[](?:[♀♂]️|[-](?:[♀♂]️)?)?|[-]|[-][-]?|(?:[♀♂]️|[-](?:[♀♂]️)?)?|[-](?:[♀♂]️|[-](?:[♀♂]️)?)?|[](?:[♀♂]️)?|[-]