如何使用csv.DictReader跳过前标题行?

时间:2011-09-28 19:20:59

标签: python csv

我希望csv.DictReader从文件中推断出字段名称。 The docs“如果省略了fieldnames参数,则csvfile第一行中的值将用作字段名。”,但在我的情况下,第一行包含标题和第二行包含名称。

我无法按照Python 3.2 skip a line in csv.DictReader应用next(reader)因为字段名称分配在初始化阅读器时发生(或者我做错了)。

csvfile(从Excel 2010导出,original source):

CanVec v1.1.0,,,,,,,,,^M
Entity,Attributes combination,"Specification Code
Point","Specification Code
Line","Specification Code
Area",Generic Code,Theme,"GML - Entity name
Shape - File name
Point","GML - Entity name
Shape - File name
Line","GML - Entity name
Shape - File name
Area"^M
Amusement park,Amusement park,,,2260012,2260009,LX,,,LX_2260009_2^M
Auto wrecker,Auto wrecker,,,2360012,2360009,IC,,,IC_2360009_2^M

我的代码:

f = open(entities_table,'rb')
try:
    dialect = csv.Sniffer().sniff(f.read(1024))
    f.seek(0)

    reader = csv.DictReader(f, dialect=dialect)
    print 'I think the field names are:\n%s\n' % (reader.fieldnames)

    i = 0
    for row in reader:
        if i < 20:
            print row
            i = i + 1

finally:
    f.close()

目前的结果:

I think the field names are:
['CanVec v1.1.0', '', '', '', '', '', '', '', '', '']

期望的结果:

I think the field names are:
['Entity','Attributes combination','"Specification Code Point"',...snip]

我意识到简单地删除第一行并继续进行是有利的,但我正试图尽可能地在原地阅读数据,并尽量减少人工干预。

2 个答案:

答案 0 :(得分:12)

f.seek(0)之后,插入:

next(f)

在初始化DictReader之前将文件指针前进到第二行。

答案 1 :(得分:1)

我使用了itertools的islice。我的标题是一个大序言的最后一行。我已通过序言并使用hederline作为字段名:

with open(file, "r") as f:
    '''Pass preamble'''
    n = 0
    for line in f.readlines():
        n += 1
        if 'same_field_name' in line: # line with field names was found
            h = line.split(',')
            break
    f.close()
    f = islice(open(i, "r"), n, None)

    reader = csv.DictReader(f, fieldnames = h)