字典中字典的多个键值对

时间:2016-04-19 17:10:43

标签: python python-2.7 dictionary string-formatting

我有一个csv文件,其中有一堆行如下所示:Individual#,ResultType,Count:

1,RESULT004,171
1,RESULT005,71
2,RESULT001,12
2,RESULT004,981
......

我的目标是最终制作一个人类可读的表格,其中包含行上的个体以及每个结果作为列的次数。如果他们没有结果,我想在那里零。像这样:

Individual1,0,0,0,171,71
Individual2,12,0,0,981,0

我正在努力做到最好的方法。我首先尝试将该文件作为列表列表读取,我可以创建该表,但是当没有匹配的测试结果时没有零,通过这样做:

import csv
individuals = [1,2,13,15,91]
resultlist = ['RESULT001', 'RESULT002', 'RESULT003', 'RESULT004', 'RESULT005']
intermediatelist = []
datafile = open(infile, 'rU')
datareader = csv.reader(datafile)
for row in datareader:
   intermediatelist.append(row)    
for individual in individualslist:
   resultfile.write(str(individual) + ',')
   for result in resultlist:
      for row in intermediatelist:
         if str(individual) == row[0] and result == row[1]:
            resultfile.write(result + ',' + str(row[2]) + ',')
   resultfile.write('\n')

当我试图指定当找不到RESULT的匹配时会发生什么,那么我最终会向文件写太多东西(例如,当个人和RESULT不匹配时,通常是这样)。考虑到这一点,似乎字典将是另一种方式。在(种)伪代码中:

for individual in individual list:
   outfile.write(individual)
   for test in testlist:
      if test in ditionary_for_individual1:
         outfile.write(dictionary_for_individual1[test])
      else:
         outfile.write('0')

我无法在我的文件中读取由每个人的字典组成的字典并正确访问它。

任何帮助都将不胜感激。

2 个答案:

答案 0 :(得分:1)

使用__missing__ hook在简单的dict子类中累积值,因此格式化的用户ID始终是结果子dict的一部分。然后使用DictWriter将它们写回来,它可以自动查找必要的字段,并自动填写缺失的值:

import csv

# Make a dict subclass that autovivifies child dict with user field filled in
class AutoUserDict(dict):
    __slots__ = ()
    def __missing__(self, key):
        '''Expects int user ID, formats as Individual###'''
        self[key] = ret = {'user': 'Individual{}'.format(key)}
        return ret

resultlist = ['RESULT001', 'RESULT002', 'RESULT003', 'RESULT004', 'RESULT005']
intermediateresults = AutoUserDict()

with open(infile, 'rb') as datafile:
    datareader = csv.reader(datafile)
    for user, rslttype, value in datareader:
        # Store new rslttype (will create subdict with formatted user first if needed)
        intermediateresults[int(user)][rslttype] = int(value)

with open(outfile, 'rb') as outf:
    datawriter = csv.DictWriter(outf, fieldnames=['user']+resultlist, restval='0')
    for user, data in sorted(intermediateresults.items()):
        datawriter.writerow(data)

答案 1 :(得分:0)

您可以使用词典

获取2D数组的功能
individuals = []
results = []
counts = {}
# read data
with open(inp_file_name,'r') as inp_file:
  for inp_line in inp_file:
    inp_list = inp_line.strip().split(','):
    i,r,c = inp_list
    if i not in individuals: individuals.append(i)
    if r not in results: results.append(r)
    counts[i,r] = int(c)
# optional sort 
individuals.sort()
results.sort()
# print data
with open(out_file_name,'w') as out_file:
  # header
  out_list = [''] + results
  out_file.write( "%s\n" % (" ".join(out_list)))
  # table
  for i in individuals:
    out_list = [ i ]
    for r in results:
      c = counts.get((i,r),0)  
      out_list.append( "%d" % c )
    out_file.write( "%s\n" % (" ".join(out_list)))

这假设个体/结果的每个组合出现一次。否则请更改为counts[i,r] = int(c) + counts.get((i,r,),0)

您还可以更改两个","的{​​{1}}或"\t"的分隔符。

它还利用了join等同于count[i,r]的事实(dict元素的关键是2元组)。

您可以添加一些错误检查。