I am trying to print chinese text to a file. When i print it on the terminal, it looks correct to me. When i type print >> filename...
i get this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 24: ordinal not in range(128)
I dont know what else i need to do.. i already encoded all textual data to utf-8
and used string formatting.
This is my code:
# -*- coding: utf-8 -*-
exclude = string.punctuation
def get_documents_cleaned(topics,filename):
c = codecs.open(filename, "w", "utf-8")
for top in topics:
print >> c , "num" , "," , "text" , "," , "id" , "," , top
document_results = proj.get('docs',text=top)['results']
for doc in document_results:
print "{0}, {1}, {2}, {3}".format(doc[1], (doc[0]['text']).encode('utf-8').translate(None,exclude), doc[0]['document']['id'], top.encode('utf-8'))
print >> c , "{0}, {1}, {2}, {3}".format(doc[1], (doc[0]['text']).encode('utf-8').translate(None,exclude), doc[0]['document']['id'], top.encode('utf-8'))
get_documents_cleaned(my_topics,"export_c.csv")
print doc[0]['text']
looks like this:
u' \u3001 \u6010...'
答案 0 :(得分:0)
由于您的第一个印刷语句有效,很明显,它不是提升UnicodeDecodeError
的格式函数。
相反,它是文件编写器的问题。 c
似乎期望一个unicode
对象,但只获取一个UTF-8编码的str
对象(让我们将它命名为s
)。因此c
会尝试隐式调用s.decode()
,这会导致UnicodeDecodeError
。
您可以通过在打印前调用s.decode('utf-8')
或使用Python默认的open(filename, "w")
函数来解决问题。