我有一个文件,每行包含一封电子邮件的主题,可能以utf-8或big-5编码。我想在python中解码这些主题并将它们打印到文件中。我正在使用Python email.header.decode_header。打印到文件的以下代码不起作用:
from email.header import decode_header
f = open("subjects.txt", 'r')
g = open("transformed_subjects", 'w')
for line in f:
dh = decode_header(line)
print >> g, ' '.join(s.decode(enc or 'ascii') for s,enc in dh )
f.close()
g.close()
我收到以下错误:
Traceback (most recent call last):
File "read_subjects.py", line 8, in <module>
print >> g, ' '.join(s.decode(enc or 'ascii') for s,enc in dh )
UnicodeEncodeError: 'ascii' codec can't encode characters in position 68-74: ordinal not in range(128)
但是当我打印到stdout时,一切正常:
from email.header import decode_header
f = open("subjects.txt", 'r')
for line in f:
dh = decode_header(line)
print ' '.join(s.decode(enc or 'ascii') for s,enc in dh )
f.close()
有什么区别?