从包含unicode字符的csv文件导出数据

时间:2013-05-15 05:13:45

标签: python csv unicode

我想从包含unicode字符串的csv文件中导出数据。

以前我尝试过一个Python脚本,它只适用于ASCII数据。但它也不支持unicode:

#! /usr/bin/env python
import csv
csv.register_dialect('custom',delimiter=','
                     doublequote=True,
                     escapechar=None,
                     quotechar='"',
                     quoting=csv.QUOTE_MINIMAL, skipinitialspace=False)
with open('input.csv') as ifile:
 data = csv.reader(ifile, dialect='custom')
 for record in data:
  for i, field in enumerate(record):
   print (" <field%s>" % i + field + "</field%s>" % i)
  

Traceback(最近一次调用最后一次):对于数据中的记录:_csv.Error:   line包含NULL byte

3 个答案:

答案 0 :(得分:2)

使用此unicode-csv库代替

https://github.com/jdunck/python-unicodecsv

import unicodecsv as csv

with open('input.csv') as ifile:
  rows = [row for row in csv.reader(ifile, encoding='utf-8')]

print rows

答案 1 :(得分:1)

您可以将csv.reader包装在一个类中以便为您处理。以下摘自csv documentation examples并适用于我:

#! /usr/bin/env python
import csv, codecs

class UTF8Recoder:
    """
    Iterator that reads an encoded stream and reencodes the input to UTF-8
    """
    def __init__(self, f, encoding):
        self.reader = codecs.getreader(encoding)(f)

    def __iter__(self):
        return self

    def next(self):
        return self.reader.next().encode("utf-8")

class UnicodeReader:
    """
    A CSV reader which will iterate over lines in the CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        f = UTF8Recoder(f, encoding)
        self.reader = csv.reader(f, dialect=dialect, **kwds)

    def next(self):
        row = self.reader.next()
        return [unicode(s, "utf-8") for s in row]

    def __iter__(self):
        return self




csv.register_dialect('custom', delimiter=',',
                     doublequote=True,
                     escapechar=None,
                     quotechar='"',
                     quoting=csv.QUOTE_MINIMAL, skipinitialspace=False)

with open('input.csv') as ifile:
 data = UnicodeReader(ifile, dialect='custom')
 for record in data:
  for i, field in enumerate(record):
   print (" <field%s>" % i + field + "</field%s>" % i)

如果您需要该功能,还有一个UnicodeWriter课程。

答案 2 :(得分:0)

您似乎正在使用Python 3.请关注the very first code example in the docs

#!/usr/bin/env python3
import csv

with open('input.csv', newline='', encoding=encoding) as csvfile:
    reader = csv.reader(csvfile, dialect="custom")
    for row in reader:
        print(", ".join(row))

其中“自定义”方言在您的问题的代码中定义,而encoding是您的文件的字符编码,例如“utf-16”。 If you omit encoding argument; the encoding returned by locale.getpreferredencoding(False) is used