我整理了一些数据并将它们变成了字典,如下所示:
gen_dict = {
"item_C_v001" : "jack",
"item_C_v002" : "kris",
"item_A_v003" : "john",
"item_B_v006" : "peter",
"item_A_v005" : "john",
"item_A_v004" : "dave"
}
我正在尝试按以下格式打印出结果:
Item Name | No. of Vers. | User
item_A | 3 | dave, john
item_B | 1 | peter
item_C | 2 | jack, kris
它将类似版本制成1行,同时计算有多少版本,同时说明用户名..
我无法集成用户名。我使用了set()
命令,这似乎适用于我的所有3行输出。
即便如此,我的物品名称'并且没有。 Vers。'列似乎是正确的,有什么方法可以检查它找到的版本数是否符合名称?如果我有一个小数据,我可以手动计算,但如果我得到大数据怎么办?
strip_ver_list = []
user_list = []
for item_name, user in gen_dict.iteritems():
# Strip out the version digits
strip_ver = item_name[:-3]
strip_ver_list.append(strip_ver)
user_list.append(user)
# This will count and remove the duplicates
versions_num = dict((duplicate, strip_ver_list.count(duplicate)) for duplicate in strip_ver_list)
for name, num in sorted(versions_num.iteritems()):
print "Version Name : {0}\nNo. of Versions : {1}\nUsers : {2}".format(name, num, set(user_list))
这是我得到的输出:
Item Name | No. of Vers. | User
item_A | 3 | set(['dave', 'john', 'jack', 'kris', 'peter'])
item_B | 1 | set(['dave', 'john', 'jack', 'kris', 'peter'])
item_C | 2 | set(['dave', 'john', 'jack', 'kris', 'peter'])
这是我能想到的唯一方法..但如果还有其他可行方法可以解决这个问题,请与我分享
答案 0 :(得分:1)
您需要按项目名称对列表进行分组,并从每个组中提取用户,否则 user_list 将始终是用户的全局列表:
from itertools import groupby
# split the item_version
sorted_ver_num = sorted(k.rsplit("_", 1) + [v] for k, v in gen_dict.items())
# group the results by the item name
for k, g in groupby(sorted_ver_num, key = lambda x: x[0]):
# extract the user list within each group
# user_list = [user for *_, user in g]
user_list = [user for _, _, user in g]
print("Version Name : {0}\nNo. of Versions : {1}\nUsers : {2}".format(k, len(user_list), set(user_list)))
Version Name : item_A
No. of Versions : 3
Users : {'dave', 'john'}
Version Name : item_B
No. of Versions : 1
Users : {'peter'}
Version Name : item_C
No. of Versions : 2
Users : {'kris', 'jack'}
答案 1 :(得分:1)
我会使用defaultdict
来汇总数据。大致是:
>>> from collections import defaultdict
>>> gen_dict = {
... "item_C_v001" : "jack",
... "item_C_v002" : "kris",
... "item_A_v003" : "john",
... "item_B_v006" : "peter",
... "item_A_v005" : "john",
... "item_A_v004" : "dave"
... }
现在......
>>> versions_num = defaultdict(lambda:dict(versions=set(), users = set()))
>>> for item_name, user in gen_dict.items():
... strip_ver = item_name[:-5]
... version_num = item_name[-3:]
... versions_num[strip_ver]['versions'].add(version_num)
... versions_num[strip_ver]['users'].add(user)
...
最后,
>>> for item, data in versions_num.items():
... print("Item {} \tno. of Versions: {}\tUsers:{}".format(item, len(data['versions']), ",".join(data['users'])))
...
Item item_B no. of Versions: 1 Users:peter
Item item_A no. of Versions: 3 Users:john,dave
Item item_C no. of Versions: 2 Users:kris,jack
>>>
如果你想要它排序:
>>> for item, data in sorted(versions_num.items()):
... print("Item {} \tno. of Versions: {}\tUsers:{}".format(item, len(data['versions']), ",".join(data['users'])))
...
Item item_A no. of Versions: 3 Users:john,dave
Item item_B no. of Versions: 1 Users:peter
Item item_C no. of Versions: 2 Users:kris,jack
答案 2 :(得分:1)
我会使用defaultdict
来跟踪用户,并使用普通的dict来跟踪计数。如果找不到密钥,dict.get()
方法允许您返回默认值,在这种情况下为0
,并且每次找到密钥时只需向其添加1
。
from collections import defaultdict
gen_dict = {
"item_C_v001" : "jack",
"item_C_v002" : "kris",
"item_A_v003" : "john",
"item_B_v006" : "peter",
"item_A_v005" : "john",
"item_A_v004" : "dave"
}
user_dict = defaultdict(set)
count_dict = {}
for item_name, user in gen_dict.iteritems():
user_dict[item_name[:-3]].add(user) # Sure you want -3 not -5?
count_dict[item_name[:-3]] = count_dict.get(item_name[:-3], 0) + 1
for name, num in sorted(count_dict.iteritems()):
print "Version Name : {0}\nNo. of Versions : {1}\nUsers : {2}".format(
name, num, ', '.join(item for item in user_dict[name]))
答案 3 :(得分:1)
IPython中的示例:
In [1]: gen_dict = {
...: "item_C_v001" : "jack",
...: "item_C_v002" : "kris",
...: "item_A_v003" : "john",
...: "item_B_v006" : "peter",
...: "item_A_v005" : "john",
...: "item_A_v004" : "dave"
...: }
获取钥匙,我们将需要更多一次。
In [2]: keys = tuple(gen_dict.keys())
找到一组项目。
In [3]: items = set(j[:-5] for j in keys)
表格标题和模板。
In [4]: header = 'Item Name | No. of Vers. | User'
In [5]: template = '{:14}|{:<15}|{}'
打印所有项目的相关信息。
In [6]: print(header)
Item Name | No. of Vers. | User
In [7]: for i in items:
...: relevant = tuple(j for j in keys if j.startswith(i))
...: users = set(gen_dict[x] for x in relevant)
...: print(template.format(i, len(relevant), ' '.join(users)))
...:
item_A |3 |john dave
item_B |1 |peter
item_C |2 |kris jack