Question

I am trying to print chinese text to a file. When i print it on the terminal, it looks correct to me. When i type print >> filename... i get this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 24: ordinal not in range(128)

I dont know what else i need to do.. i already encoded all textual data to utf-8 and used string formatting.

This is my code:

# -*- coding: utf-8 -*-

exclude = string.punctuation

    def get_documents_cleaned(topics,filename):
        c = codecs.open(filename, "w", "utf-8")
        for top in topics:
            print >> c , "num" , "," , "text" , "," , "id" , "," , top
            document_results = proj.get('docs',text=top)['results']
            for doc in document_results:
                print "{0}, {1}, {2}, {3}".format(doc[1], (doc[0]['text']).encode('utf-8').translate(None,exclude), doc[0]['document']['id'], top.encode('utf-8'))
                print >> c , "{0}, {1}, {2}, {3}".format(doc[1], (doc[0]['text']).encode('utf-8').translate(None,exclude), doc[0]['document']['id'], top.encode('utf-8'))

get_documents_cleaned(my_topics,"export_c.csv")

print doc[0]['text'] looks like this:

u' \u3001 \u6010...'

Answer 1

由于您的第一个印刷语句有效，很明显，它不是提升UnicodeDecodeError的格式函数。

相反，它是文件编写器的问题。 c似乎期望一个unicode对象，但只获取一个UTF-8编码的str对象（让我们将它命名为s）。因此c会尝试隐式调用s.decode()，这会导致UnicodeDecodeError。

您可以通过在打印前调用s.decode('utf-8')或使用Python默认的open(filename, "w")函数来解决问题。

writing utf-8 encoded text to a file

1 个答案: