我想通过它们的值总和将记录分组,为这个组分配一个唯一的数字,等等。这是下面的脚本应该做的,但是在shell中运行时遇到以下错误:< / p>
line 5, in <module>
d = dict(reader(infile))
ValueError: dictionary update sequence element #0 has length 5; 2 is required
#!/usr/bin/python
from csv import reader
with open('file.csv',mode='r') as infile:
d = dict(reader(infile))
dictf = {}
for key, value in d.iteritems():
try:
dictf[key] = float(value)
except: pass
flag = 1
sum = 0
final = {}
sumpop = []
for key in sorted(d.iterkeys()):
if 45000.0 < sum < 55000.0 or sum > 50000:
flag += 1
sumpop.append(sum)
sum = 0
sum += dictf[key]
try:
final [flag] += " " + key
except:
final [flag] = key
output = open("output.csv","w+")
output.write("TRACT,POPULATION,NUMBER,FLAG,SUMPOP\n")
for key,sum in zip(sorted(final.iterkeys()),sumpop):
flag = "1"
for value in final[key].split(" "):
output.write( value + "," + dictf[value].__str__() + "," + key.__str__() + "," + flag + "," + sum.__str__() + "\n")
flag = ""
output.close()
output.csv将拥有100%的输入记录,但会被分配一个Number(一个组ID) - 一组人口总数约为50,000的记录的一部分。
答案 0 :(得分:1)
csv reader函数返回一个列表生成器,默认情况下,每个这样的列表代表一行,列表元素是该行中的单词。
将csv文件读入字典应该有点不同,取决于csv文件结构,如:
from csv import reader
d = {}
with open('file.csv',mode='r') as infile:
for idx, line in enumerate(reader(infile)):
if idx:
d[line[0]] = line[1]
编辑:在看到共享csv文件后添加了跳过第一行