Question

# -*- coding: utf-8 -*-
d = {}
with open('transl.txt', 'r') as f:
    for line in f:  
        (key, val) = line.split(' = ')
        d[key] = val

print d

以下是 transl.txt（编码是ANSI）文件中的内容：

send = button
addr = аддрес

当我运行程序时，我得到了这个输出：

'addr': '\xe0\xe4\xe4\xf0\xe5\xf1', 'send': 'button\n'

Answer 1

# -*- coding: utf-8 -*-
d = {}
with open('transl.txt', 'r') as f:
    for line in f:  
        (key, val) = line.split(' = ')
        d[key] = val.decode("windows-1251")
# now the values contain unicode strings.
# this may or may not be desired. If you need to convert them
# back to byte sequences in a given encoding use `.encode(<encoding-name>)`
# method of unicode string
print d

Answer 2

您可以使用标准库中的codecs.open()。

import codecs
d = {}
with codecs.open('transl.txt', encoding='maccyrillic') as f:
    for line in f:  
        (key, val) = line.split(u' = ')
        d[key] = val
print d['button']

您可以在文档中找到list of standard codecs。您的输入看起来像是maccyrillic。

Answer 3

必须是您的终端配置不正确;将LANG和LC_ALL环境变量设置为en_US.UTF-8或您的语言的unicode等效项。

如何在python中使非英文字母可读？

3 个答案: