Question

我在列表中有一个dicts，有些dicts是相同的。我想找到重复的，并希望添加到新列表或字典中，它们有多少重复。

import itertools

myListCombined = list()
for a, b in itertools.combinations(myList, 2):
    is_equal = set(a.items()) - set(b.items())
    if len(is_equal) == 0:
        a.update(count=2)
        myListCombined.append(a)
    else:
        a.update(count=1)
        b.update(count=1)
        myListCombined.append(a)
        myListCombined.append(b)

myListCombined = [i for n, i enumerate(myListCombine) if i not in myListCombine[n + 1:]]

此代码有点工作，但它只是列表中的2个重复的dicts。 a.update（count = 2）在这种情况下不起作用。我也在最后一行中删除了重复的dicts，但是我不确定它是否会运行良好。

输入：

[{'name': 'Mary', 'age': 25, 'salary': 1000},
{'name': 'John', 'age': 25, 'salary': 2000},
{'name': 'George', 'age': 30, 'salary': 2500},
{'name': 'John', 'age': 25, 'salary': 2000},
{'name': 'John', 'age': 25, 'salary': 2000}]

期望的输出：

[{'name': 'Mary', 'age': 25, 'salary': 1000, 'count':1},
{'name': 'John', 'age': 25, 'salary': 2000, 'count': 3},
{'name': 'George', 'age': 30, 'salary': 2500, 'count' 1}]

Answer 1

您可以尝试以下操作，首先将每个字典转换为密钥的冻结集，值元组（以便它们可以根据collections.Counter的要求进行清理）。

import collections
a = [{'a':1}, {'a':1},{'b':2}]
print(collections.Counter(map(lambda x: frozenset(x.items()),a)))

编辑以反映您所需的输入/输出：

from copy import deepcopy

def count_duplicate_dicts(list_of_dicts):
    cpy = deepcopy(list_of_dicts)
    for d in list_of_dicts:
        d['count'] = cpy.count(d)
    return list_of_dicts

x = [{'a':1},{'a':1}, {'c':3}]
print(count_duplicate_dicts(x))

Answer 2

您可以使用collections.Counter获取计数值，然后在将Counter的计数值添加到每个冻结集后重建dicts：

from collections import Counter

l = [dict(d | {('count', c)}) for d, c in Counter(frozenset(d.items()) 
                                                  for d in myList).items()]  
print(l)
# [{'salary': 1000, 'name': 'Mary', 'age': 25, 'count': 1}, 
#  {'name': 'John', 'salary': 2000, 'age': 25, 'count': 3}, 
#  {'salary': 2500, 'name': 'George', 'age': 30, 'count': 1}]

Answer 3

如果您的dict数据结构良好且dict的内容是简单的数据类型，例如数字和字符串，并且您有以下数据分析处理，我建议您使用pandas，它提供丰富的功能。以下是您案例的示例代码：

In [32]: data = [{'name': 'Mary', 'age': 25, 'salary': 1000},
    ...: {'name': 'John', 'age': 25, 'salary': 2000},
    ...: {'name': 'George', 'age': 30, 'salary': 2500},
    ...: {'name': 'John', 'age': 25, 'salary': 2000},
    ...: {'name': 'John', 'age': 25, 'salary': 2000}]
    ...: 
    ...: df = pd.DataFrame(data)
    ...: df['counts'] = 1
    ...: df = df.groupby(df.columns.tolist()[:-1]).sum().reset_index(drop=False)
    ...: 

In [33]: df
Out[33]: 
   age    name  salary  counts
0   25    John    2000       3
1   25    Mary    1000       1
2   30  George    2500       1

In [34]: df.to_dict(orient='records')
Out[34]: 
[{'age': 25, 'counts': 3, 'name': 'John', 'salary': 2000},
 {'age': 25, 'counts': 1, 'name': 'Mary', 'salary': 1000},
 {'age': 30, 'counts': 1, 'name': 'George', 'salary': 2500}]

逻辑是：

（1）首先从数据中构建DataFrame

（2）groupby函数可以对每个组进行聚合功能。

（3）要输出回dict，您可以拨打pd.to_dict

熊猫是一个很大的包，花了一些时间学习它，但值得了解大熊猫。它非常强大，可以使您的数据分析更加快速和优雅。

感谢。

Answer 4

你可以试试这个：

import collections

d = [{'name': 'Mary', 'age': 25, 'salary': 1000},
{'name': 'John', 'age': 25, 'salary': 2000},
{'name': 'George', 'age': 30, 'salary': 2500},
{'name': 'John', 'age': 25, 'salary': 2000},
{'name': 'John', 'age': 25, 'salary': 2000}]

count = dict(collections.Counter([i["name"] for i in d]))
a = list(set(map(tuple, [i.items() for i in d])))
final_dict = [dict(list(i)+[("count", count[dict(i)["name"]])]) for i in a]

输出：

[{'salary': 2000, 'count': 3, 'age': 25, 'name': 'John'}, {'salary': 2500, 'count': 1, 'age': 30, 'name': 'George'}, {'salary': 1000, 'count': 1, 'age': 25, 'name': 'Mary'}]

Python在列表中找到重复的dicts并将它们与计数分开

4 个答案: