在保留值的同时计算和删除键中的重复项

时间:2017-02-06 20:25:05

标签: python dictionary

我整理了一些数据并将它们变成了字典,如下所示:

gen_dict = {
 "item_C_v001" : "jack",
 "item_C_v002" : "kris",
 "item_A_v003" : "john",
 "item_B_v006" : "peter",
 "item_A_v005" : "john",
 "item_A_v004" : "dave"
}

我正在尝试按以下格式打印出结果:

Item Name     | No. of Vers.      | User
item_A        | 3                 | dave, john
item_B        | 1                 | peter
item_C        | 2                 | jack, kris

它将类似版本制成1行,同时计算有多少版本,同时说明用户名..

我无法集成用户名。我使用了set()命令,这似乎适用于我的所有3行输出。 即便如此,我的物品名称'并且没有。 Vers。'列似乎是正确的,有什么方法可以检查它找到的版本数是否符合名称?如果我有一个小数据,我可以手动计算,但如果我得到大数据怎么办?

strip_ver_list = []
user_list = []
for item_name, user in gen_dict.iteritems():
    # Strip out the version digits
    strip_ver = item_name[:-3]
    strip_ver_list.append(strip_ver)
    user_list.append(user)


# This will count and remove the duplicates
versions_num = dict((duplicate, strip_ver_list.count(duplicate)) for duplicate in strip_ver_list)

for name, num in sorted(versions_num.iteritems()):
    print "Version Name : {0}\nNo. of Versions : {1}\nUsers : {2}".format(name, num, set(user_list))

这是我得到的输出:

Item Name     | No. of Vers.      | User
item_A        | 3                 | set(['dave', 'john', 'jack', 'kris', 'peter'])
item_B        | 1                 | set(['dave', 'john', 'jack', 'kris', 'peter'])
item_C        | 2                 | set(['dave', 'john', 'jack', 'kris', 'peter'])

这是我能想到的唯一方法..但如果还有其他可行方法可以解决这个问题,请与我分享

4 个答案:

答案 0 :(得分:1)

您需要按项目名称对列表进行分组,并从每个组中提取用户,否则 user_list 将始终是用户的全局列表:

from itertools import groupby
# split the item_version
sorted_ver_num = sorted(k.rsplit("_", 1) + [v] for k, v in gen_dict.items())

# group the results by the item name
for k, g in groupby(sorted_ver_num, key = lambda x: x[0]):
    # extract the user list within each group
    # user_list = [user for *_, user in g]  
    user_list = [user for _, _, user in g]
    print("Version Name : {0}\nNo. of Versions : {1}\nUsers : {2}".format(k, len(user_list), set(user_list)))


Version Name : item_A
No. of Versions : 3
Users : {'dave', 'john'}
Version Name : item_B
No. of Versions : 1
Users : {'peter'}
Version Name : item_C
No. of Versions : 2
Users : {'kris', 'jack'}

答案 1 :(得分:1)

我会使用defaultdict来汇总数据。大致是:

>>> from collections import defaultdict
>>> gen_dict = {
...  "item_C_v001" : "jack",
...  "item_C_v002" : "kris",
...  "item_A_v003" : "john",
...  "item_B_v006" : "peter",
...  "item_A_v005" : "john",
...  "item_A_v004" : "dave"
... }

现在......

>>> versions_num = defaultdict(lambda:dict(versions=set(), users = set()))
>>> for item_name, user in gen_dict.items():
...     strip_ver = item_name[:-5]
...     version_num = item_name[-3:]
...     versions_num[strip_ver]['versions'].add(version_num)
...     versions_num[strip_ver]['users'].add(user)
...

最后,

>>> for item, data in versions_num.items():
...     print("Item {} \tno. of Versions: {}\tUsers:{}".format(item, len(data['versions']), ",".join(data['users'])))
...
Item item_B     no. of Versions: 1      Users:peter
Item item_A     no. of Versions: 3      Users:john,dave
Item item_C     no. of Versions: 2      Users:kris,jack
>>>

如果你想要它排序:

>>> for item, data in sorted(versions_num.items()):
...     print("Item {} \tno. of Versions: {}\tUsers:{}".format(item, len(data['versions']), ",".join(data['users'])))
...
Item item_A     no. of Versions: 3      Users:john,dave
Item item_B     no. of Versions: 1      Users:peter
Item item_C     no. of Versions: 2      Users:kris,jack

答案 2 :(得分:1)

我会使用defaultdict来跟踪用户,并使用普通的dict来跟踪计数。如果找不到密钥,dict.get()方法允许您返回默认值,在这种情况下为0,并且每次找到密钥时只需向其添加1

from collections import defaultdict

gen_dict = {
 "item_C_v001" : "jack",
 "item_C_v002" : "kris",
 "item_A_v003" : "john",
 "item_B_v006" : "peter",
 "item_A_v005" : "john",
 "item_A_v004" : "dave"
}

user_dict = defaultdict(set)
count_dict = {}

for item_name, user in gen_dict.iteritems():
    user_dict[item_name[:-3]].add(user) # Sure you want -3 not -5?
    count_dict[item_name[:-3]] = count_dict.get(item_name[:-3], 0) + 1

for name, num in sorted(count_dict.iteritems()):
    print "Version Name : {0}\nNo. of Versions : {1}\nUsers : {2}".format(
                   name, num, ', '.join(item for item in user_dict[name]))

答案 3 :(得分:1)

IPython中的示例:

In [1]: gen_dict = {
   ...:  "item_C_v001" : "jack",
   ...:  "item_C_v002" : "kris",
   ...:  "item_A_v003" : "john",
   ...:  "item_B_v006" : "peter",
   ...:  "item_A_v005" : "john",
   ...:  "item_A_v004" : "dave"
   ...: }

获取钥匙,我们将需要更多一次。

In [2]: keys = tuple(gen_dict.keys())

找到一组项目。

In [3]: items = set(j[:-5] for j in keys)

表格标题和模板。

In [4]: header = 'Item Name     | No. of Vers.      | User'

In [5]: template = '{:14}|{:<15}|{}'

打印所有项目的相关信息。

In [6]: print(header)
Item Name     | No. of Vers.      | User

In [7]: for i in items:
   ...:     relevant = tuple(j for j in keys if j.startswith(i))
   ...:     users = set(gen_dict[x] for x in relevant)
   ...:     print(template.format(i, len(relevant), ' '.join(users)))
   ...:     
item_A        |3              |john dave
item_B        |1              |peter
item_C        |2              |kris jack