我需要处理的Emojis是“❤️ “

Question

import re

b="united thats weak. See ya "
print b.decode('utf-8')  #output: u'united thats weak. See ya \U0001f44b'

print re.findall(r'[\U0001f600-\U0001f650]',b.decode('utf-8'),flags=re.U) # output: [u'S']

如何获得输出\U0001f44b。请帮忙

我需要处理的Emojis是“❤️ “

Answer 1

搜索unicode范围与搜索任何类型的字符范围完全相同。但是，您需要正确表示字符串。这是一个有效的例子：

#coding: utf-8
import re

b=u"united thats weak. See ya  "
assert re.findall(u'[\U0001f600-\U0001f650]',b) == [u'']
assert re.findall(ur'[-]',b) == [u'']

注意：

您的计划的第一行或第二行需要#coding: utf-8或类似内容。
在您的示例中，您使用的表情符号U-1f44b不在U-1f600到U-1f650的范围内。在我的例子中，我使用了一个。
如果您想使用\U包含unicode字符，则无法使用原始字符串前缀（r''）。
但是如果您使用字符本身（而不是\U转义），那么您可以使用原始字符串前缀。
您需要确保模式和输入字符串都是unicode字符串。它们都不是UTF8编码的字符串。
但除非您的模式包含re.U，\s或类似内容，否则您不需要\w标记。

在python中找到字符串中unicodes的所有匹配项

我需要处理的Emojis是“❤️ < / EM> < EM> < / EM> “

1 个答案: