如何将包含unicode escape \ u ####的字符串转换为utf-8字符串

时间:2018-03-16 08:12:18

标签: python python-3.x unicode python-unicode

我从早上开始尝试这个。

我的sample.txt

choice = \u9078\u629e

代码:

with open('sample.txt', encoding='utf-8') as f:
    for line in f:
        print(line)
        print("選択" in line)
        print(line.encode('utf-8').decode('utf-8'))
        print(line.encode().decode('utf-8'))
        print(line.encode('utf-8').decode())
        print(line.encode().decode('unicode-escape').encode("latin-1").decode('utf-8')) # as suggested.

out:
choice = \u9078\u629e
False
choice = \u9078\u629e
choice = \u9078\u629e
choice = \u9078\u629e
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 9-10: ordinal not in range(256)

当我在ipython qtconsole中执行此操作时:

In [29]: "choice = \u9078\u629e"
Out[29]: 'choice = 選択'

所以问题是如何读取包含unicode转义字符串的文本文件,如\u9078\u629e(我不确切知道它的名称)并将其转换为utf-8,如選択

1 个答案:

答案 0 :(得分:2)

如果您从文件中读取它,请在打开时提供编码:

with open('test.txt', encoding='unicode-escape') as f:    
    a = f.read()
print(a)

# choice = 選択

test.txt包含:

  

choice = \ u9078 \ u629e

如果您已将文字放在字符串中,则可以将其转换为:

a = "choice = \\u9078\\u629e"
a.encode().decode('unicode-escape')
# 'choice = 選択'