Python 2和3 csv阅读器

时间:2011-03-03 12:12:19

标签: python encoding csv python-3.x

我正在尝试使用csv模块读取utf-8 csv文件,由于编码,我在创建python 2和3的通用代码时遇到了一些麻烦。

以下是Python 2.7中的原始代码:

with open(filename, 'rb') as csvfile:
    csv_reader = csv.reader(csvfile, quotechar='\"')
    langs = next(csv_reader)[1:]
    for row in csv_reader:
        pass

但是当我用python 3运行它时,它不喜欢我没有“编码”打开文件的事实。我试过这个:

with codecs.open(filename, 'r', encoding='utf-8') as csvfile:
    csv_reader = csv.reader(csvfile, quotechar='\"')
    langs = next(csv_reader)[1:]
    for row in csv_reader:
        pass

现在python 2无法解码“for”循环中的行。那么......我该怎么做?

3 个答案:

答案 0 :(得分:13)

实际上,在Python 2中,文件应该以二进制模式打开,但在Python 3中以文本模式打开。 Also in Python 3 newline='' should be specified(你忘了)。

你必须在if-block中打开文件。

import sys

if sys.version_info[0] < 3: 
    infile = open(filename, 'rb')
else:
    infile = open(filename, 'r', newline='', encoding='utf8')


with infile as csvfile:
    ...

答案 1 :(得分:2)

更新:虽然原始答案中的代码有效,但我同时在https://pypi.python.org/pypi/csv342发布了一个小程序包,它为Python 2提供了类似Python 3的界面。所以你的Python版本独立于你可以简单地做一个

import csv342 as csv
import io
with io.open('some.csv', 'r', encoding='utf-8', newline='') as csv_file:
    for row in csv.reader(csv_file, delimiter='|'):
        print(row)

原始答案:这是一个解决方案,即使使用Python 2,实际上也会将文本解码为Unicode字符串,因此可以使用UTF-8以外的编码。

下面的代码定义了一个函数csv_rows(),它将文件的内容作为列表序列返回。用法示例:

for row in csv_rows('some.csv', encoding='iso-8859-15', delimiter='|'):
    print(row)

以下是csv_rows()的两个变体:一个用于Python 3+,另一个用于Python 2.6+。在运行时,它会自动选择适当的变体。 UTF8RecoderUnicodeReaderexamples in the Python 2.7 library documentation的逐字副本。

import csv
import io
import sys


if sys.version_info[0] >= 3:
    # Python 3 variant.
    def csv_rows(csv_path, encoding, **keywords):
        with io.open(csv_path, 'r', newline='', encoding=encoding) as csv_file:
            for row in csv.reader(csv_file, **keywords):
                yield row

else:
    # Python 2 variant.
    import codecs

    class UTF8Recoder:
        """
        Iterator that reads an encoded stream and reencodes the input to UTF-8
        """
        def __init__(self, f, encoding):
            self.reader = codecs.getreader(encoding)(f)

        def __iter__(self):
            return self

        def next(self):
            return self.reader.next().encode("utf-8")


    class UnicodeReader:
        """
        A CSV reader which will iterate over lines in the CSV file "f",
        which is encoded in the given encoding.
        """

        def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
            f = UTF8Recoder(f, encoding)
            self.reader = csv.reader(f, dialect=dialect, **kwds)

        def next(self):
            row = self.reader.next()
            return [unicode(s, "utf-8") for s in row]

        def __iter__(self):
            return self


    def csv_rows(csv_path, encoding, **kwds):
        with io.open(csv_path, 'rb') as csv_file:
            for row in UnicodeReader(csv_file, encoding=encoding, **kwds):
                yield row

答案 2 :(得分:0)

旧问题我知道,但我正在研究如何做到这一点。以防有人遇到这个并且可能会发现它有用。

这就是我解决了我的问题,感谢Lennart Regebro的暗示。 :

if sys.version > '3':
       rd = csv.reader(open(input_file, 'r', newline='',
       encoding='iso8859-1'), delimiter=';', quotechar='"')
else:
       rd = csv.reader(open(input_file, 'rb'), delimiter=';',
       quotechar='"')

然后做你需要做的事情:

for row in rd:
       ......