我有一个csv文件,其中有一堆行如下所示:Individual#,ResultType,Count:
1,RESULT004,171
1,RESULT005,71
2,RESULT001,12
2,RESULT004,981
......
我的目标是最终制作一个人类可读的表格,其中包含行上的个体以及每个结果作为列的次数。如果他们没有结果,我想在那里零。像这样:
Individual1,0,0,0,171,71
Individual2,12,0,0,981,0
我正在努力做到最好的方法。我首先尝试将该文件作为列表列表读取,我可以创建该表,但是当没有匹配的测试结果时没有零,通过这样做:
import csv
individuals = [1,2,13,15,91]
resultlist = ['RESULT001', 'RESULT002', 'RESULT003', 'RESULT004', 'RESULT005']
intermediatelist = []
datafile = open(infile, 'rU')
datareader = csv.reader(datafile)
for row in datareader:
intermediatelist.append(row)
for individual in individualslist:
resultfile.write(str(individual) + ',')
for result in resultlist:
for row in intermediatelist:
if str(individual) == row[0] and result == row[1]:
resultfile.write(result + ',' + str(row[2]) + ',')
resultfile.write('\n')
当我试图指定当找不到RESULT的匹配时会发生什么,那么我最终会向文件写太多东西(例如,当个人和RESULT不匹配时,通常是这样)。考虑到这一点,似乎字典将是另一种方式。在(种)伪代码中:
for individual in individual list:
outfile.write(individual)
for test in testlist:
if test in ditionary_for_individual1:
outfile.write(dictionary_for_individual1[test])
else:
outfile.write('0')
我无法在我的文件中读取由每个人的字典组成的字典并正确访问它。
任何帮助都将不胜感激。
答案 0 :(得分:1)
使用__missing__
hook在简单的dict
子类中累积值,因此格式化的用户ID始终是结果子dict
的一部分。然后使用DictWriter
将它们写回来,它可以自动查找必要的字段,并自动填写缺失的值:
import csv
# Make a dict subclass that autovivifies child dict with user field filled in
class AutoUserDict(dict):
__slots__ = ()
def __missing__(self, key):
'''Expects int user ID, formats as Individual###'''
self[key] = ret = {'user': 'Individual{}'.format(key)}
return ret
resultlist = ['RESULT001', 'RESULT002', 'RESULT003', 'RESULT004', 'RESULT005']
intermediateresults = AutoUserDict()
with open(infile, 'rb') as datafile:
datareader = csv.reader(datafile)
for user, rslttype, value in datareader:
# Store new rslttype (will create subdict with formatted user first if needed)
intermediateresults[int(user)][rslttype] = int(value)
with open(outfile, 'rb') as outf:
datawriter = csv.DictWriter(outf, fieldnames=['user']+resultlist, restval='0')
for user, data in sorted(intermediateresults.items()):
datawriter.writerow(data)
答案 1 :(得分:0)
您可以使用词典
获取2D数组的功能individuals = []
results = []
counts = {}
# read data
with open(inp_file_name,'r') as inp_file:
for inp_line in inp_file:
inp_list = inp_line.strip().split(','):
i,r,c = inp_list
if i not in individuals: individuals.append(i)
if r not in results: results.append(r)
counts[i,r] = int(c)
# optional sort
individuals.sort()
results.sort()
# print data
with open(out_file_name,'w') as out_file:
# header
out_list = [''] + results
out_file.write( "%s\n" % (" ".join(out_list)))
# table
for i in individuals:
out_list = [ i ]
for r in results:
c = counts.get((i,r),0)
out_list.append( "%d" % c )
out_file.write( "%s\n" % (" ".join(out_list)))
这假设个体/结果的每个组合出现一次。否则请更改为counts[i,r] = int(c) + counts.get((i,r,),0)
。
您还可以更改两个","
的{{1}}或"\t"
的分隔符。
它还利用了join
等同于count[i,r]
的事实(dict元素的关键是2元组)。
您可以添加一些错误检查。