Question

在提出问题之前，我想举个例子。

u_string = u'\xcb\xa5\xb5'
u_string
Out[79]: 'Ë¥µ'
asc_string = ascii(u_string)
asc_string
Out[81]: "'\\xcb\\xa5\\xb5'"

在这里，我终于得到一个仅包含ascii字符的ascii字符串（asc_string）。

我的问题是如果我只有asc_string，如何将其转换为原始的u_string（Unicode字符串）？

谢谢马丁

Answer 1

在这种情况下，最简单的完全正确的方法是ast.literal_eval：

>>> import ast
>>> origversion = u'\xcb\xa5\xb5'  # Leading u is unnecessary on Python 3
>>> asciiform = ascii(origversion)
>>> origversion == ast.literal_eval(asciiform)
True

之所以可行，是因为在字符串上使用ascii会加上引号和转义符，以使该字符串包含一个可重现原始字符串（只是repr的字符串文字，但在字符串中仅保留ASCII字符） repr）； ast.literal_eval用于解析文字的规范repr（无论是否经过ASCII编码）以产生结果对象，在这种情况下为字符串。

Answer 2

您可以像这样解码它：

asc_string.encode().decode( 'unicode-escape' )  
# "'Ë¥µ'"

我不知道为什么，但是ascii增加了一组引号，您可以删除这样的引号：

asc_string.encode().decode( 'unicode-escape' )[1:-1]
# 'Ë¥µ'

如何在Python中通过ASCII字符串恢复Unicode字符串？

2 个答案: