Python中的数据分析不会迭代记录

时间:2013-12-11 05:46:41

标签: python

我想通过它们的值总和将记录分组,为这个组分配一个唯一的数字,等等。这是下面的脚本应该做的,但是在shell中运行时遇到以下错误:< / p>

line 5, in <module>
d = dict(reader(infile))
ValueError: dictionary update sequence element #0 has length 5; 2 is required

#!/usr/bin/python 
from csv import reader
with open('file.csv',mode='r') as infile:
d = dict(reader(infile))

dictf = {}
for key, value in d.iteritems():
try:
    dictf[key] = float(value)
except: pass

flag = 1
sum = 0
final = {}
sumpop = []

for key in sorted(d.iterkeys()):
if 45000.0 < sum < 55000.0 or sum > 50000:
    flag += 1
    sumpop.append(sum)
    sum = 0
sum += dictf[key]
try:
    final [flag] += " " + key
except:
    final [flag] = key

output = open("output.csv","w+")
output.write("TRACT,POPULATION,NUMBER,FLAG,SUMPOP\n")

for key,sum in zip(sorted(final.iterkeys()),sumpop):
flag = "1"
for value in final[key].split(" "):
    output.write( value + "," + dictf[value].__str__() + "," + key.__str__() + ","      +  flag + "," + sum.__str__() + "\n")
    flag = ""

output.close()

output.csv将拥有100%的输入记录,但会被分配一个Number(一个组ID) - 一组人口总数约为50,000的记录的一部分。

1 个答案:

答案 0 :(得分:1)

csv reader函数返回一个列表生成器,默认情况下,每个这样的列表代表一行,列表元素是该行中的单词。

将csv文件读入字典应该有点不同,取决于csv文件结构,如:

from csv import reader
d = {}
with open('file.csv',mode='r') as infile:
    for idx, line in enumerate(reader(infile)):
        if idx:
            d[line[0]] = line[1]

编辑:在看到共享csv文件后添加了跳过第一行