在python 3

时间:2017-12-08 14:20:31

标签: python unicode formatting emoji

我正在尝试将表情符号转换为python 3中的Unicode。例如,我会使用表情符号,并希望得到相应的unicode“U + 1F600”。同样我想将'U + 1F600'转换回来。现在我已经阅读了文档并尝试了几个选项,但是pythons行为让我感到困惑。

>>> x = ''
>>> y = x.encode('utf-8')
>>> y
b'\xf0\x9f\x98\x80'

表情符号被转换为字节对象。

>>> z = y.decode('utf-8')
>>> z
''

将字节对象转换回表情符号,到目前为止一直很好。

现在,为表情符号取出unicode:

>>> c = '\U0001F600'
>>> d = c.encode('utf-8')
>>> d
>>> b'\xf0\x9f\x98\x80'

再次打印出字节编码。

>>> d.decode('utf-8')
>>> ''

再次打印出表情符号。我真的无法弄清楚如何在Unicode和表情符号之间进行转换。

2 个答案:

答案 0 :(得分:9)

''已经是一个Unicode对象。 UTF-8不是Unicode,它是Unicode的字节编码。要获取Unicode字符的代码点编号,可以使用ord函数。要以您希望的形式打印它,您可以将其格式化为十六进制。像这样:

s = ''
print('U+{:X}'.format(ord(s)))

<强>输出

U+1F600

如果您使用的是Python 3.6+,则可以使用f-string使其更短(且效率更高):

s = ''
print(f'U+{ord(s):X}')

顺便说一句,如果要创建像'\U0001F600'这样的Unicode转义序列,那就是'unicode-escape'编解码器。但是,它返回一个bytes字符串,您可能希望将其转换回文本。您可以使用'UTF-8'编解码器,但您也可以使用'ASCII'编解码器,因为它保证只包含有效的ASCII。

s = ''
print(s.encode('unicode-escape'))
print(s.encode('unicode-escape').decode('ASCII'))

<强>输出

b'\\U0001f600'
\U0001f600

我建议你看一下Stack Overflow联合创始人Joel Spolsky The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)这篇简短的文章。

答案 1 :(得分:1)

sentence = "Head-Up Displays (HUD)? for #automotive? sector\n \nThe #UK-based #startup? Envisics got €42 million #funding? from l… "
print("normal sentence - ", sentence)

uc_sentence = sentence.encode('unicode-escape')
print("\n\nunicode represented sentence - ", uc_sentence)

decoded_sentence = uc_sentence.decode('unicode-escape')
print("\n\ndecoded sentence - ", decoded_sentence)

输出

normal sentence -  Head-Up Displays (HUD)? for #automotive? sector
 
The #UK-based #startup? Envisics got €42 million #funding? from l… 


unicode represented sentence -  b'Head-Up Displays (HUD)\\U0001f4bb for #automotive\\U0001f697 sector\\n \\nThe #UK-based #startup\\U0001f680 Envisics got \\u20ac42 million #funding\\U0001f4b0 from l\\u2026 '


decoded sentence -  Head-Up Displays (HUD)? for #automotive? sector
 
The #UK-based #startup? Envisics got €42 million #funding? from l…