Question

我是一个noob编码器在使用Python csv模块解析csv文件时遇到问题。问题是我的输出表明，对于除第一个字段之外的所有字段，行中的字段值都是“无”。

这是我想要解析的丑陋csv文件中的第一行（其余行遵循相同的格式）：

0,213726,NORTH FORK SLATE CREEK,CAMPGROUND,North Fork Slate Creek Campground | Idaho |      Public Lands Information Center | Recreation Search, http://www.publiclands.org/explore/site.php?plicstate=ID&id=2268,NA,NA,NA,NA,(208)839-2211,"Nez Perce National Forest  Operating Days: 305<br>Total Capacity: 25<br>

5 campsites at the confluence of Slate Creek and its North Fork. A number of trails form loops in the area. These are open to most traffic, including trail bikes.","From Slate Creek, go 8 miles east on Forest Road 354.",NA,http://www.publiclands.org/explore/reg_nat_forest.php?region=7&forest_name=Nez%20Perce%20National%20Forest,NA,NA,NA,45.6,-116.1,NA,N,0,1103,2058

这是我编写的用于解析csv文件的代码（它无法正常工作！）：</ p>

import csv

#READER SETTINGS
f_path = '/Users/foo'
f_handler = open(f_path, 'rU').read().replace('\n',' ')
my_fieldnames = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 
'col8', 'col9', 'col10', 'col11', 'col12', 'col13', 'col14', 'col15', 
'col16', 'col17', 'col18', 'col19', 'col20', 'col21', 'col22', 'col23', 
'col24','col25']
f_reader = csv.DictReader(f_handler, fieldnames=my_fieldnames, delimiter=',', dialect=csv.excel)

#NOW I TRY TO PARSE THE CSV FILE
i = 0
for row in f_reader:
    print "my first row was %s" % row
    i = i + 1
    if i > 0:
        break

这是输出。它说除了第一个以外的所有领域都是空白的，我不知道为什么！任何建议都会非常感激。

my first row was {'col14': None, 'col15': None, 'col16': None, 
'col17': None, 'col10': None, 'col11': None, 'col12': None, 
'col13': None, 'col18': None, 'col19': None, 'col2': None, 'col8': None, 
'col9': None, 'col6': None, 'col7': None, 'col4': None, 'col5': None, 
'col3': None, 'col1': '0', 'col25': None, 'col24': None, 
'col21': None, 'col20': None, 'col23': None, 'col22': None}

Answer 1

试试这个：

#!/usr/bin/env python

import csv

my_fieldnames = ['col' + str(i) for i in range(1,26)]

with open('input.csv', 'rb') as csvfile:
    my_reader = csv.DictReader(csvfile, fieldnames=my_fieldnames,
                               delimiter=',', dialect=csv.excel,
                               quoting=csv.QUOTE_NONE)

    for row in my_reader:
        for k,v in row.iteritems():
            print k, v

输出第一行输入（请记住字典是无序的）：

col14 None
col15 None
col16 None
col17 None
col10 NA
col11 (208)839-2211
col12 "Nez Perce National Forest  Operating Days: 305<br>Total Capacity: 25<br>
col13 None
col18 None
col19 None
col8 NA
col9 NA
col6  http://www.publiclands.org/explore/site.php?plicstate=ID&id=2268
col7 NA
col4 CAMPGROUND
col5 North Fork Slate Creek Campground | Idaho |      Public Lands Information Center | Recreation Search
col2 213726
col3 NORTH FORK SLATE CREEK
col1 0
col25 None
col24 None
col21 None
col20 None
col23 None
col22 None

Answer 2

不同软件系统称之为CSV的东西变化很大。幸运的是，Python出色的CSV模块非常擅长处理这些细节，因此您无需手动处理这些内容。

让我强调使用@ metaperture的答案，但没有解释：你可以通过自动检测方言来避免在Python中读取CSV文件的所有猜测。一旦你指出那个部分就没那么多可能出错了。

让我举个简单的例子：

    import csv

    with open(filename, 'rb') as csvfile:
        dialect = csv.Sniffer().sniff(csvfile.read(10024))
        csvfile.seek(0)
        qreader = csv.reader(csvfile, dialect)
        cnt = 0
        for item in qreader:
            if cnt >0:
                #process your data
            else:
                #the header of the csv file (field names)    
            cnt = cnt + 1

Answer 3

当你这样做时：

f_handler = open(f_path, 'rU').read().replace('\n',' ')

您要删除所有换行符，这就是csv.excel dialect检测新行的方式。由于文件只有一行，因此只返回一次。

此外，您正在做：

if i > 0:
    break

在第一次迭代后，它会终止你的for循环。

关于他们为何空白，默认的restval为None（请参阅http://docs.python.org/3.2/library/csv.html），因此键可能不匹配。尽量不要包括fieldnames参数，你可能会发现这个方言中的键是＆＃34; col2＆＃34;，＆＃34; COL3＆＃34;等等。

我使用的一个可爱的小包装：

def iter_trim(dict_iter):
#return (dict(zip([k.strip(" \t\n\r") for k in row.keys()], [v.strip(" \t\n\r") for v in row.values()])) for row in dict_iter)
 for row in dict_iter:
    try:
        d =  dict(zip([k.strip(" \t\n\r") for k in row.keys()], [v.strip(" \t\n\r") for v in row.values()]))
        yield d
    except:
        print "row error:"
        print row

使用示例：

def csv_iter(filename):
    csv_fp = open(filename)
    guess_dialect = csv.Sniffer().sniff(csv_fp.read(16384))
    csv_fp.seek(0)
    csv_reader = csv.DictReader(csv_fp,dialect=guess_dialect)
    return iter_trim(csv_reader)
for row in csv_iter("some-file.csv"):
    # do something...
    print row

Python csv DictReader为字段值返回“None”;有任何想法吗？

3 个答案: