Python将Unicode-Hex utf-8字符串转换为Unicode字符串

时间:2011-09-30 11:36:46

标签: python unicode utf-8

拥有s = u'Gaga\xe2\x80\x99s'但需要转换为t = u'Gaga\u2019s'

如何才能最好地实现这一目标?

3 个答案:

答案 0 :(得分:8)

s = u'Gaga\xe2\x80\x99s'
t = u'Gaga\u2019s'
x = s.encode('raw-unicode-escape').decode('utf-8')
assert x==t

print(x)

产量

Gaga’s

答案 1 :(得分:7)

无论你解码原始字符串,它都可能用latin-1或近亲解码。由于latin-1是Unicode的前256个代码点,因此可以工作:

>>> s = u'Gaga\xe2\x80\x99s'
>>> s.encode('latin-1').decode('utf8')
u'Gaga\u2019s'

答案 2 :(得分:2)

import codecs

s = u"Gaga\xe2\x80\x99s"
s_as_str = codecs.charmap_encode(s)[0]
t = unicode(s_as_str, "utf-8")
print t

打印

u'Gaga\u2019s'