如何输出如下结果:
user I R H
=================
atl001 2 1 0
cms017 1 2 1
lhc003 0 1 2
从这样的列表:
atl001 I
atl001 I
cms017 H
atl001 R
lhc003 H
cms017 R
cms017 I
lhc003 H
lhc003 R
cms017 R
即。我想计算每个用户I
,H
和R
的数量。请注意,在此特定情况下,我无法使用groupby
中的itertools
。在此先感谢您的帮助。干杯!!
答案 0 :(得分:6)
data='''atl001 I
atl001 I
cms017 H
atl001 R
lhc003 H
cms017 R
cms017 I
lhc003 H
lhc003 R
cms017 R'''
stats={}
for i in data.split('\n'):
user, irh = i.split()
u = stats.setdefault(user, {})
u[irh] = u.setdefault(irh, 0) + 1
print 'user I R H'
for user in sorted(stats):
stat = stats[user]
print user, stat.get('I', 0), stat.get('R', 0), stat.get('H', 0)
答案 1 :(得分:2)
data = 112*'cms017 R\n'
data = data + '''atl001 I
cms017 R
atl001 I
cms017 H
atl001 R
lhcabc003 H
cms017 R
lhcabc003 H
lhcabc003 R
cms017 R
cms017 R
cms017 R'''
print data,'\n'
stats = {}
d = {'I':0,'R':1,'H':2}
L = 0
for line in data.splitlines():
user,irh = line.split()
stats.setdefault(user,[0,0,0])
stats[user][d[irh]] += 1
L = max(L, len(user))
LL = len(str(max(max(stats[user])
for user in stats )))
cale = ' %%%ds %%%ds %%%ds' % (LL,LL,LL)
ch = 'user'.ljust(L) + cale % ('I','R','H')
print '%s\n%s' % (ch, len(ch)*'=')
print '\n'.join(user.ljust(L) + cale % tuple(stats[user])
for user in sorted(stats.keys()))
结果
user I R H
=====================
atl001 2 1 0
cms017 0 117 1
lhcabc003 0 1 2
此外:
data = 14*'cms017 R\n'
data = data + '''atl001 I
cms017 R
atl001 I
cms017 H
atl001 R
lhcabc003 H
cms017 R
lhcabc003 H
lhcabc003 R
cms017 R
cms017 R
cms017 R'''
print data,'\n'
Y = {}
L = 0
for line in data.splitlines():
user,irh = line.split()
L = max(L, len(user))
if (user,irh) not in Y:
Y.update({(user,'I'):0,(user,'R'):0,(user,'H'):0})
Y[(user,irh)] += 1
LL = len(str(max(x for x in Y.itervalues())))
cale = '%%-%ds %%%ds %%%ds %%%ds' % (L,LL,LL,LL)
ch = cale % ('user','I','R','H')
print '%s\n%s' % (ch, len(ch)*'=')
li = sorted(Y.keys())
print '\n'.join(cale % (a[0],Y[b],Y[c],Y[a])
for a,b,c in (li[x:x+3] for x in xrange(0,len(li),3)))
结果
user I R H
==================
atl001 2 1 0
cms017 0 19 1
lhcabc003 0 1 2
PS:
用户名称在L个字符
中都是合理的在我的代码中,为了避免Sebastian代码中的复杂性,I,R,H的列在相同数量的LL字符中是合理的,这是此列中存在的所有结果的最大值
答案 2 :(得分:1)
好吧,无论如何,使用groupby
来解决这个问题毫无意义。对于初学者来说,您的数据没有排序(groupby
没有为您排序组),而且行非常简单。
在处理每一行时只需保持计数。我假设你不知道你会得到什么标志:
from sets import Set as set # python2.3 compatibility
counts = {} # counts stored in user -> dict(flag=counter) nested dicts
flags = set()
for line in inputfile:
user, flag = line.strip().split()
usercounts = counts.setdefault(user, {})
usercounts[flag] = usercounts.setdefault(flag, 0) + 1
flags.add(flag)
在此之后打印信息是迭代计数结构的问题。我假设用户名总是6个字符长:
flags = list(flags)
flags.sort()
users = counts.keys()
users.sort()
print "user %s" % (' '.join(flags))
print "=" * (6 + 3 * len(flags))
for user in users:
line = [user]
for flag in flags:
line.append(counts[user].get(flag, 0))
print ' '.join(line)
上面的所有代码都是未经测试的,但应该大致有效。
答案 3 :(得分:1)
这是一个使用嵌套dicts计算作业状态并在打印前计算最大字段宽度的变体:
#!/usr/bin/env python
import fileinput
from sets import Set as set # python2.3
# parse job statuses
counter = {}
for line in fileinput.input():
user, jobstatus = line.split()
d = counter.setdefault(user, {})
d[jobstatus] = d.setdefault(jobstatus, 0) + 1
# print job statuses
# . find field widths
status_names = set([name for st in counter.itervalues() for name in st])
maxstatuslens = [max([len(str(i)) for st in counter.itervalues()
for n, i in st.iteritems()
if name == n])
for name in status_names]
maxuserlen = max(map(len, counter))
row_format = (("%%-%ds " % maxuserlen) +
" ".join(["%%%ds" % n for n in maxstatuslens]))
# . print header
header = row_format % (("user",) + tuple(status_names))
print header
print '='*len(header)
# . print rows
for user, statuses in counter.iteritems():
print row_format % (
(user,) + tuple([statuses.get(name, 0) for name in status_names]))
$ python print-statuses.py <input.txt
user I H R
============
lhc003 0 2 1
cms017 1 1 2
atl001 2 0 1
这是一个使用平面字典并使用元组(user, status_name)
作为键的变体:
#!/usr/bin/env python
import fileinput
from sets import Set as set # python 2.3
# parse job statuses
counter = {}
maxstatuslens = {}
maxuserlen = 0
for line in fileinput.input():
key = user, status_name = tuple(line.split())
i = counter[key] = counter.setdefault(key, 0) + 1
maxstatuslens[status_name] = max(maxstatuslens.setdefault(status_name, 0),
len(str(i)))
maxuserlen = max(maxuserlen, len(user))
# print job statuses
row_format = (("%%-%ds " % maxuserlen) +
" ".join(["%%%ds" % n for n in maxstatuslens.itervalues()]))
# . print header
header = row_format % (("user",) + tuple(maxstatuslens))
print header
print '='*len(header)
# . print rows
for user in set([k[0] for k in counter]):
print row_format % ((user,) +
tuple([counter.get((user, status), 0) for status in maxstatuslens]))
使用和输出相同。
答案 4 :(得分:0)
作为提示:
使用嵌套字典结构来计算出现次数:
用户 - &gt;字符 - &gt;出现用户角色
编写解析器代码并递增计数器并打印结果 取决于你...一个很好的锻炼。