Question

我想将数据写入文件，其中CSV中的行应该看起来像这个列表（直接来自Python控制台）：

row = ['\xef\xbb\xbft_11651497', 'http://kozbeszerzes.ceu.hu/entity/t/11651497.xml', "Szabolcs Mag '98 Kft.", 'ny\xc3\xadregyh\xc3\xa1za', 'ny\xc3\xadregyh\xc3\xa1za', '4400', 't\xc3\xbcnde utca 20.', 47.935175, 21.744975, u'Ny\xedregyh\xe1za', u'Borb\xe1nya', u'Szabolcs-Szatm\xe1r-Bereg', u'Ny\xedregyh\xe1zai', u'20', u'T\xfcnde utca', u'Magyarorsz\xe1g', u'4405']

Py2k没有做Unicode，但我有一个UnicodeWriter包装器：

import cStringIO, codecs
class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([unicode(s).encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

但是，这些行仍会产生下面的可怕编码错误消息：

f.write(codecs.BOM_UTF8)
writer = UnicodeWriter(f)
writer.writerow(row)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 9: ordinal not in range(128)

该怎么办？谢谢！

Answer 1

您正在传递包含非ASCII数据的字节串，并且正在使用此行的默认编解码器将这些数据解码为Unicode：

self.writer.writerow([unicode(s).encode("utf-8") for s in row])

带有无法解码为ASCII的数据的

unicode(bytestring)失败：

>>> unicode('\xef\xbb\xbft_11651497')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)

在将数据传递给writer之前将数据解码为Unicode：

row = [v.decode('utf8') if isinstance(v, str) else v for v in row]

这假设您的bytestring值包含UTF-8数据。如果您有多种编码，请尝试在原点解码为Unicode;程序首先获取数据的位置。无论如何，你真的想要这样做，无论数据来自何处，或者它是否已经编码为UTF-8。

如何在Python 2.7中编写unicode csv

1 个答案: