我有以下(空格分隔)输入:
2012-10-05 PETER 6
2012-10-05 PETER 4
2012-10-06 PETER 60
2012-10-06 TOM 10
2012-10-08 SOMNATH 80
我想实现以下管道分隔输出: (其中列为[日期和名称,NUM个参数,最后一个颜色的和])
2012-10-05 PETER|2|10
2012-10-06 PETER|1|60
2012-10-06 TOM|1|10
2012-10-08 SOMNATH|1|80
到目前为止,这是我的代码:
s = open("output.txt","r")
fn=s.readlines()
d = {}
for line in fn:
parts = line.split()
if parts[0] in d:
d[parts[0]][1] += int(parts[2])
d[parts[0]][2] += 1
else:
d[parts[0]] = [parts[1], int(parts[2]), 1]
for date in sorted(d):
print "%s %s|%d|%d" % (date, d[date][0], d[date][2], d[date][1])
我得到的输出为:
2012-10-06 PETER|2|70
而不是
2012-10-06 PETER|1|60
并且TOM
未显示在列表中。
我需要做些什么来纠正我的代码?
答案 0 :(得分:2)
d = collections.defaultdict(list)
with open('output.txt', 'r') as f:
for line in f:
date, name, val = line.split()
d[date, name].append(int(val))
for (date, name), vals in sorted(d.items()):
print '%s %s|%d|%d' % (date, name, len(vals), sum(vals))
答案 1 :(得分:0)
< 3 itertools
import itertools
with open('output.txt', 'r') as f:
splitlines = (line.split() for line in f if line.strip())
for (date, name), bits in itertools.groupby(splitlines, key=lambda bits: bits[:2]):
total = 0
count = 0
for _, _, val in bits:
total += int(val)
count += 1
print '%s %s|%d|%d' % (date, name, count, total)
如果您不想使用groupby
(或者它不可用,或者您的输入数据无法保证排序),这是一个传统的解决方案(实际上只是代码的固定版本) ):
d = {}
with open('output.txt', 'r') as f:
for line in f:
date, name, val = line.split()
key = (date, name)
if key not in d:
d[key] = [0, 0]
d[key][0] += int(val)
d[key][1] += 1
for key in sorted(d):
date, name = key
total, count = d[key]
print '%s %s|%d|%d' % (date, name, count, total)
请注意,我们使用(date, name)
作为关键,而不是仅使用date
。