CSV到字典转换

时间:2017-05-22 10:57:27

标签: python csv dictionary

我有这个csv文件。我想将其转换为字典。此csv文件包含17584980

ozone,particullate_matter,carbon_monoxide,sulfure_dioxide,nitrogen_dioxide,longitude,latitude,timestamp,avgMeasuredTime,avgSpeed,extID,medianMeasuredTime,TIMESTAMP:1,vehicleCount,_id,REPORT_ID,Lat1,Long1,Lat2,Long2,Distance between 2 points,duration of measurements,ndt in kmh
127,38,62,22,39,10.1050,56.2317,1406859600,74,50,668,74,1406859600,5,20746220,158324,56.2317,10.1050,56.2258,10.1166,1030,52,71
122,35,61,17,34,10.1050,56.2317,1406859900,73,50,668,73,1406859900,6,20746392,158324,56.2317,10.1050,56.2258,10.1166,1030,52,71
117,36,65,24,34,10.1050,56.2317,1406860200,61,60,668,61,1406860200,4,20746723,158324,56.2317,10.1050,56.2258,10.1166,1030,52,71

我尝试了什么

#code to generate dictionaries from csv file
import csv

reader = csv.DictReader(open('resultsout.csv'))

output = open("finaldata.py","w")

result = {}
for row in reader:
    for column, value in row.iteritems():
    result.setdefault(column, []).append(float(value))

output.write(str(result))

错误:

Traceback (most recent call last):
  File "dictionaries.py", line 11, in <module>
    result.setdefault(column, []).append(float(value))
ialueError: invalid literal for float(): 32

但是此代码在

之前有效

1 个答案:

答案 0 :(得分:1)

虽然这是不安全的方式来做你想要的(更不用说有一点理由将巨大的CSV转换成巨大的Python文件),只要你修复了代码应该工作的缩进 - 问题源于您在此处未显示的部分数据 - 其中的某些值很糟糕(例如32\x0032\x07),但未能转换为浮点数。

以下是如何处理它:

import csv

DEFAULT = 0.0  # value to use when conversion fails

with open("resultsout.csv", "r") as i:
    reader = csv.DictReader(i)
    result = {k: [] for k in reader.fieldnames}
    for row in reader:
        for column, value in row.iteritems():
            try:
                result[column].append(float(value))
            except ValueError:
                result[column].append(DEFAULT)
    with open("finaldata.py", "w") as o:
        o.write(str(result))

或者,您可以选择在转换前删除非数字字符,以确保转换不会因为一些额外的不可打印字符而失败:

import csv
import re

STRIP_CHARS = re.compile(r"[^\d.]+")

with open("resultsout.csv", "r") as i:
    reader = csv.DictReader(i)
    result = {k: [] for k in reader.fieldnames}
    for row in reader:
        for column, value in row.iteritems():
            result[column].append(float(STRIP_CHARS.sub("", value)))
    with open("finaldata.py", "w") as o:
        o.write(str(result))

或者您可以将两者结合起来以获得最大的可靠性。