我有这个csv文件。我想将其转换为字典。此csv文件包含17584980
行
ozone,particullate_matter,carbon_monoxide,sulfure_dioxide,nitrogen_dioxide,longitude,latitude,timestamp,avgMeasuredTime,avgSpeed,extID,medianMeasuredTime,TIMESTAMP:1,vehicleCount,_id,REPORT_ID,Lat1,Long1,Lat2,Long2,Distance between 2 points,duration of measurements,ndt in kmh
127,38,62,22,39,10.1050,56.2317,1406859600,74,50,668,74,1406859600,5,20746220,158324,56.2317,10.1050,56.2258,10.1166,1030,52,71
122,35,61,17,34,10.1050,56.2317,1406859900,73,50,668,73,1406859900,6,20746392,158324,56.2317,10.1050,56.2258,10.1166,1030,52,71
117,36,65,24,34,10.1050,56.2317,1406860200,61,60,668,61,1406860200,4,20746723,158324,56.2317,10.1050,56.2258,10.1166,1030,52,71
我尝试了什么
#code to generate dictionaries from csv file
import csv
reader = csv.DictReader(open('resultsout.csv'))
output = open("finaldata.py","w")
result = {}
for row in reader:
for column, value in row.iteritems():
result.setdefault(column, []).append(float(value))
output.write(str(result))
错误:
Traceback (most recent call last):
File "dictionaries.py", line 11, in <module>
result.setdefault(column, []).append(float(value))
ialueError: invalid literal for float(): 32
但是此代码在
之前有效答案 0 :(得分:1)
虽然这是不安全的方式来做你想要的(更不用说有一点理由将巨大的CSV转换成巨大的Python文件),只要你修复了代码应该工作的缩进 - 问题源于您在此处未显示的部分数据 - 其中的某些值很糟糕(例如32\x00
或32\x07
),但未能转换为浮点数。
以下是如何处理它:
import csv
DEFAULT = 0.0 # value to use when conversion fails
with open("resultsout.csv", "r") as i:
reader = csv.DictReader(i)
result = {k: [] for k in reader.fieldnames}
for row in reader:
for column, value in row.iteritems():
try:
result[column].append(float(value))
except ValueError:
result[column].append(DEFAULT)
with open("finaldata.py", "w") as o:
o.write(str(result))
或者,您可以选择在转换前删除非数字字符,以确保转换不会因为一些额外的不可打印字符而失败:
import csv
import re
STRIP_CHARS = re.compile(r"[^\d.]+")
with open("resultsout.csv", "r") as i:
reader = csv.DictReader(i)
result = {k: [] for k in reader.fieldnames}
for row in reader:
for column, value in row.iteritems():
result[column].append(float(STRIP_CHARS.sub("", value)))
with open("finaldata.py", "w") as o:
o.write(str(result))
或者您可以将两者结合起来以获得最大的可靠性。