考虑我有一个Unicode字符串(不是真正的unicode,而是看起来像unicode的字符串)。我想得到它的utf-8变种。我怎么能在Python中做到这一点? 例如,如果我有像:
这样的字符串title = "\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
我怎样才能获得它的utf-8变体(格鲁吉亚符号):
ისრაელი==იერუსალიმი
简单地说,我想拥有像:
这样的代码title = "\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
utfTitle = title.TurnToUTF()
print(utfTitle)
我希望这段代码有输出:
ისრაელი==იერუსალიმი
答案 0 :(得分:4)
在这里,你去吧。只需使用decode
方法并应用unicode_escape
对于Python 2.x
title = "\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
utfTitle = title.decode('unicode_escape')
print(utfTitle)
#output :ისრაელი == იერუსალიმი
对于python 3.x
title = "\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
print(title.encode('ascii').decode('unicode-escape'))
答案 1 :(得分:3)
您可以使用unicode-escape编解码器去掉双反斜杠并有效地使用字符串。
假设title
是str
,您需要在解码回unicode(str
)之前先对字符串进行编码。
>>> t = title.encode('utf-8').decode('unicode-escape')
>>> t
'ისრაელი == იერუსალიმი'
如果title
是bytes
个实例,您可以直接解码:
>>> t = title.decode('unicode-escape')
>>> t
'ისრაელი == იერუსალიმი'
答案 2 :(得分:0)
假设unicode是str类型并使用decode和unicode-escape进行转换 方法
title="\\u10d8\\u10e1\\u10e0\\u10d0\\u10d4\\u10da\\u10d8 == \\u10d8\\u10d4\\u10e0\\u10e3\\u10e1\\u10d0\\u10da\\u10d8\\u10db\\u10d8"
res1 = title.encode('utf-8')
res2 = res1.decode('unicode-escape')
print(res2)