我正在我的程序中生成嵌套字典。生成之后,我想遍历该字典,并检查字典键和值。
程序编码
这是我要迭代的字典,其值包含另一个字典。
main_dict = {101: {1234: [11111,11111],5678: [44444,44444]},
102: {9100: [55555,55555],1112: [77777,88888]}}
我正在阅读csv文件并将内容存储在此词典中。像这样:
Input.csv -
lineno,item,total
101,1234,11111
101,1234,11111
101,5678,44444
101,5678,44444
102,9100,55555
102,9100,55555
102,1112,77777
102,1112,88888
这是输入csv文件。我正在读这个csv文件,我想知道一个独特的项目总数是重复多少次?
对于那些东西,我这样做:
for line in reader:
if line[0] in main_dict:
if line[1] in main_dict[line[0]]:
main_dict[line[0]][line[1]].append(line[2])
else:
main_dict[line[0]].update({line[1]:[line[2]]})
else:
main_dict[line[0]] = {line[1]:[line[2]]}
print main_dict
以上程序的输出:
{101: {1234: [11111,11111],5678: [44444,44444]},
102: {9100: [55555,55555],1112: [77777,88888]}}
但我在此行中遇到以下错误 -
if line[1] in main_dict[line[0]]:
IndexError: list index out of range
main_dict的迭代 -
for key,value in main_dict.iteritems():
f1 = open(outputfile + op_directory +'/'+ key+'.csv', 'w')
writer1 = csv.DictWriter(f1, delimiter=',', fieldnames = fieldname)
writer1.writeheader()
if type(value) == type({}):
for k,v in value.iteritems():
if type(v) == type([]):
set1 = set(v)
for se in set1:
writer1.writerow({'item':k,'total':se,'total_count':v.count(se)})
我想知道迭代这种字典的最佳方法吗?
有时候我会像上面的字典一样得到正确的结果,但很多次我面对这个错误,我错过了什么?
提前致谢!
答案 0 :(得分:0)
正如评论所指出的那样,你不会检查line
的长度是否为3:
for line in reader:
if not len(line) == 3:
continue
关于你的算法,我会使用嵌套的defaultdict
来避免if / else行。
编辑:我在问题编辑后添加了一个新的defaultdict和csv写作部分:
from collections import defaultdict
import csv
counter = defaultdict(lambda: defaultdict(list))
main_dict= defaultdict(lambda: defaultdict(lambda: defaultdict(dict)))
fieldnames=['item', 'total', 'total_count']
# we suppose reader is a cvs.reader object
with open('input.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for line in reader:
if not len(line) == 3:
continue
# Remove unwanted spaces
lineno, item, total = [el.strip() for el in line]
# Do not deal with non digit entries (title for example)
if not lineno.isdigit():
continue
counter[lineno][item].append(total)
csvdict = {'item': item,
'total': total,
'total_count': counter[lineno][item].count(total)}
main_dict[lineno][item][total].update(csvdict)
# The writing part
for lineno in sorted(main_dict):
itemdict = main_dict[lineno]
output = 'output_%s.csv' % lineno
with open(output, 'wb') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames, delimiter=',')
writer.writeheader()
for totaldict in itemdict.values():
for csvdict in totaldict.values():
writer.writerow(csvdict)
然后,您可以使用以下函数打印结果的可读表示:
def myprint(obj, ntab=0):
if isinstance(obj, (dict, defaultdict)):
for k in sorted(obj):
myprint('%s%s'%(ntab*' ', k), ntab+1)
myprint(obj[k], ntab+1)
else:
print('%s%s'%(ntab*' ', obj))
myprint(main_dict)
但是如果你想计算项目总数,我会使用另一个defaultdict,其中total为键,而元组(lineno,item)为值:
from collections import defaultdict
import csv
total_dict = defaultdict(list)
# we suppose reader is a cvs.reader object
with open('input.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for line in reader:
if not len(line) == 3:
continue
# Remove unwanted spaces
lineno, item, total = [el.strip() for el in line]
# Do not deal with non digit entries (title for example)
if not lineno.isdigit():
continue
total_dict[total].append((lineno, item))
您可以非常轻松地获得每个总数:
>>> print len(total_dict['55555'])
2