我是一个noob编码器在使用Python csv模块解析csv文件时遇到问题。问题是我的输出表明,对于除第一个字段之外的所有字段,行中的字段值都是“无”。
这是我想要解析的丑陋csv文件中的第一行(其余行遵循相同的格式):
0,213726,NORTH FORK SLATE CREEK,CAMPGROUND,North Fork Slate Creek Campground | Idaho | Public Lands Information Center | Recreation Search, http://www.publiclands.org/explore/site.php?plicstate=ID&id=2268,NA,NA,NA,NA,(208)839-2211,"Nez Perce National Forest Operating Days: 305<br>Total Capacity: 25<br>
5 campsites at the confluence of Slate Creek and its North Fork. A number of trails form loops in the area. These are open to most traffic, including trail bikes.","From Slate Creek, go 8 miles east on Forest Road 354.",NA,http://www.publiclands.org/explore/reg_nat_forest.php?region=7&forest_name=Nez%20Perce%20National%20Forest,NA,NA,NA,45.6,-116.1,NA,N,0,1103,2058
这是我编写的用于解析csv文件的代码(它无法正常工作!):</ p>
import csv
#READER SETTINGS
f_path = '/Users/foo'
f_handler = open(f_path, 'rU').read().replace('\n',' ')
my_fieldnames = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7',
'col8', 'col9', 'col10', 'col11', 'col12', 'col13', 'col14', 'col15',
'col16', 'col17', 'col18', 'col19', 'col20', 'col21', 'col22', 'col23',
'col24','col25']
f_reader = csv.DictReader(f_handler, fieldnames=my_fieldnames, delimiter=',', dialect=csv.excel)
#NOW I TRY TO PARSE THE CSV FILE
i = 0
for row in f_reader:
print "my first row was %s" % row
i = i + 1
if i > 0:
break
这是输出。它说除了第一个以外的所有领域都是空白的,我不知道为什么!任何建议都会非常感激。
my first row was {'col14': None, 'col15': None, 'col16': None,
'col17': None, 'col10': None, 'col11': None, 'col12': None,
'col13': None, 'col18': None, 'col19': None, 'col2': None, 'col8': None,
'col9': None, 'col6': None, 'col7': None, 'col4': None, 'col5': None,
'col3': None, 'col1': '0', 'col25': None, 'col24': None,
'col21': None, 'col20': None, 'col23': None, 'col22': None}
答案 0 :(得分:3)
试试这个:
#!/usr/bin/env python
import csv
my_fieldnames = ['col' + str(i) for i in range(1,26)]
with open('input.csv', 'rb') as csvfile:
my_reader = csv.DictReader(csvfile, fieldnames=my_fieldnames,
delimiter=',', dialect=csv.excel,
quoting=csv.QUOTE_NONE)
for row in my_reader:
for k,v in row.iteritems():
print k, v
输出第一行输入(请记住字典是无序的):
col14 None
col15 None
col16 None
col17 None
col10 NA
col11 (208)839-2211
col12 "Nez Perce National Forest Operating Days: 305<br>Total Capacity: 25<br>
col13 None
col18 None
col19 None
col8 NA
col9 NA
col6 http://www.publiclands.org/explore/site.php?plicstate=ID&id=2268
col7 NA
col4 CAMPGROUND
col5 North Fork Slate Creek Campground | Idaho | Public Lands Information Center | Recreation Search
col2 213726
col3 NORTH FORK SLATE CREEK
col1 0
col25 None
col24 None
col21 None
col20 None
col23 None
col22 None
答案 1 :(得分:2)
不同软件系统称之为CSV的东西变化很大。幸运的是,Python出色的CSV模块非常擅长处理这些细节,因此您无需手动处理这些内容。
让我强调使用@ metaperture的答案,但没有解释:你可以通过自动检测方言来避免在Python中读取CSV文件的所有猜测。一旦你指出那个部分就没那么多可能出错了。
让我举个简单的例子:
import csv
with open(filename, 'rb') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(10024))
csvfile.seek(0)
qreader = csv.reader(csvfile, dialect)
cnt = 0
for item in qreader:
if cnt >0:
#process your data
else:
#the header of the csv file (field names)
cnt = cnt + 1
答案 2 :(得分:0)
当你这样做时:
f_handler = open(f_path, 'rU').read().replace('\n',' ')
您要删除所有换行符,这就是csv.excel dialect检测新行的方式。由于文件只有一行,因此只返回一次。
此外,您正在做:
if i > 0:
break
在第一次迭代后,它会终止你的for循环。
关于他们为何空白,默认的restval为None(请参阅http://docs.python.org/3.2/library/csv.html),因此键可能不匹配。尽量不要包括fieldnames参数,你可能会发现这个方言中的键是&#34; col2&#34;,&#34; COL3&#34;等等。
我使用的一个可爱的小包装:
def iter_trim(dict_iter):
#return (dict(zip([k.strip(" \t\n\r") for k in row.keys()], [v.strip(" \t\n\r") for v in row.values()])) for row in dict_iter)
for row in dict_iter:
try:
d = dict(zip([k.strip(" \t\n\r") for k in row.keys()], [v.strip(" \t\n\r") for v in row.values()]))
yield d
except:
print "row error:"
print row
使用示例:
def csv_iter(filename):
csv_fp = open(filename)
guess_dialect = csv.Sniffer().sniff(csv_fp.read(16384))
csv_fp.seek(0)
csv_reader = csv.DictReader(csv_fp,dialect=guess_dialect)
return iter_trim(csv_reader)
for row in csv_iter("some-file.csv"):
# do something...
print row