Question

即使设置了csv.DictReader，restval似乎也会跳过空行。使用以下内容，将跳过输入文件中的空行：

import csv
CSV_FIELDS = ("field1", "field2", "field3")
for row in csv.DictReader(open("f"), fieldnames=CSV_FIELDS, restval=""):
    if not row or not row[CSV_FIELDS[0]]:
        sys.exit("never reached, why?")

文件f的位置是：

1,2,3


a,b,c

Answer 1

Inside the csv.DictReader class：

    # unlike the basic reader, we prefer not to return blanks,
    # because we will typically wind up with a dict full of None
    # values
    while row == []:
        row = self.reader.next()

因此会跳过空行。如果您不想跳过空行，则可以使用csv.reader。

另一种选择是继承csv.DictReader：

import csv
CSV_FIELDS = ("field1", "field2", "field3")

class MyDictReader(csv.DictReader):
    def next(self):
        if self.line_num == 0:
            # Used only for its side effect.
            self.fieldnames
        row = self.reader.next()
        self.line_num = self.reader.line_num

        d = dict(zip(self.fieldnames, row))
        lf = len(self.fieldnames)
        lr = len(row)
        if lf < lr:
            d[self.restkey] = row[lf:]
        elif lf > lr:
            for key in self.fieldnames[lr:]:
                d[key] = self.restval
        return d

for row in MyDictReader(open("f", 'rb'), fieldnames=CSV_FIELDS, restval=""):
    print(row)

产量

{'field2': '2', 'field3': '3', 'field1': '1'}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': 'b', 'field3': 'c', 'field1': 'a'}

Answer 2

Unutbu已经指出了这种情况发生的原因，无论如何快速解决方法是将空行替换为','，然后再将其传递给DictReader，然后restval将处理其余的东西。

CSV_FIELDS = ("field1", "field2", "field3")

with open('test.csv') as f:
    lines = (',' if line.isspace() else line for line in f)
    for row in csv.DictReader(lines, fieldnames=CSV_FIELDS, restval=""):
        print row

#output
{'field2': '2', 'field3': '3', 'field1': '1'}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': 'b', 'field3': 'c', 'field1': 'a'}

<强>更新

如果是多行空值，则上述代码不会执行此操作，在这种情况下，您可以使用csv.reader，如下所示：

RESTVAL = ''

with open('test.csv') as f:
    for row in csv.reader(f, quotechar='"'):
        if not row:
            # Don't use `dict.fromkeys` if RESTVAL is a mutable object
            # {k: RESTVAL for k in CSV_FIELDS}
            print dict.fromkeys(CSV_FIELDS, RESTVAL)
        else:
            print {k: v if v else RESTVAL for k, v in zip(CSV_FIELDS, row)}

如果文件包含：

1,2,"


4"


a,b,c

然后输出将是：

{'field2': '2', 'field3': '\n\n\n4', 'field1': '1'}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': 'b', 'field3': 'c', 'field1': 'a'}

Answer 3

这是你的档案：

1,2,3
,,
,,
a,b,c

我添加了昏迷，现在他需要两个空行{'field2': '', 'field3': '', 'field1': ''} 对于restval参数，它只是说如果您设置了字段但缺少一个字段，则其他值将转到此值。

所以你设置了三个字段，每次都有三个值。但是我们在这里讨论“列”而不是线。

你的行是空的，所以他跳过它，除非你指定comas他需要取空值，为dictreader。

为什么csv.DictReader跳过空行？

3 个答案: