Question

我有一个嵌套的字典，我试图在其中查找重复项。例如，如果我有：

dictionary = {'hello': 3 , 'world':{'this': 5 , 'is':{'a': 3, 'dict': None}}}

返回值将类似于：

True

因为该词典包含重复项。

我可以使用常规字典轻松地做到这一点，并且我认为这在这种情况下也可以很好地工作：

dictionary = {'hello': 3 , 'world':{'this': 5 , 'is':{'a': 3, 'dict': None}}}
rev_dictionary = {}

for key, value in dictionary.items():
    rev_dictionary.setdefault(value, set()).add(key)
    print(rev_dictionary)

for key,values in dictionary.items():
    if len(values) > 1:
        values = True
    else:
        values = False

会引发以下错误：

TypeError: unhashable type: 'dict'

我该如何工作？

感谢您的帮助！

注意：如果可能的话，我希望不使用库的解决方案

Answer 1

我假设您是按值而不是键定义重复项。在这种情况下，您可以使用（mentioned here）

来平整嵌套的字典

def flatten(d):
    out = {}
    for key, val in d.items():
        if isinstance(val, dict):
            val = [val]
        if isinstance(val, list):
            for subdict in val:
                deeper = flatten(subdict).items()
                out.update({key + '_' + key2: val2 for key2, val2 in deeper})
        else:
            out[key] = val
    return out

然后检查条件

v = flatten(d).values()
len(set(v))!=len(v)

结果为True

Answer 2

我写了一个简单的解决方案：

dictionary = {'hello': 3 , 'world':{'this': 5 , 'is':{'a': 3, 'dict': None}}}

def get_dups(a, values=None):
    if values is None: values = []
    if (a in values): return True
    values.append(a)
    if type(a) == dict:
        for i in a.values():
            if (get_dups(i, values=values)):
                return True
    return False

print(get_dups(dictionary))

工作原理

我们首先将每个value保存在列表中，然后将其传递给函数。每次运行时，我们都会检查当前值是否在该列表中，一旦重复则返回True。

if (a in values): return True

接下来，如果当前索引也是一个字典，我们将遍历这些值并对其运行get_dups。

Answer 3

我认为您所需要的只是在传递到重复检测管道之前先整理字典：

import pandas as pd

def flatten_dict(d):
    df = pd.io.json.json_normalize(d, sep='_')
    return df.to_dict(orient='records')[0]

dictionary = {'hello': 3 , 'world':{'this': 5 , 'is':{'a': 3, 'dict': None}}}

dictionary = flatten_dict(dictionary)
print('flattend')
print(dictionary)

rev_dictionary = {}

for key, value in dictionary.items():
    rev_dictionary.setdefault(value, set()).add(key)

print('reversed')
print(rev_dictionary)

is_duplicate = False
for key, values in rev_dictionary.items():
    if len(values) > 1:
        is_duplicate = True
        break

print('is duplicate?', is_duplicate)

结果：

flattend
{'hello': 3, 'world_is_a': 3, 'world_is_dict': None, 'world_this': 5}
reversed
{3: {'hello', 'world_is_a'}, None: {'world_is_dict'}, 5: {'world_this'}}
is duplicate? True

平整从Flatten nested Python dictionaries, compressing keys借来的字典的代码。

Answer 4

您可以递归将子字典的项目值添加到集合中，如果该集合中已经“看到”任何项目值，请引发异常，以便包装器可以返回.hcr来指示重复被发现：

True

这样，给定您的示例输入，def has_dupes(d): def values(d): seen = set() for k, v in d.items(): if isinstance(v, dict): s = values(v) if seen & s: raise RuntimeError() seen.update(s) else: if v in seen: raise RuntimeError() seen.add(v) return seen try: values(d) except RuntimeError: return True return False将返回：has_dupes(dictionary)

Answer 5

将嵌套字典转换为其值的嵌套列表：

def nested_values(v):
    return map(nested_values, v.values()) if isinstance(v, dict) else v

然后将嵌套列表平整为词典中所有值的一个列表，然后检查平整的值列表以查找重复项：

from itertools import chain

def is_duplicated_value(d):
    flat = list(chain.from_iterable(nested_values(d)))
    return len(flat) != len(set(flat))

测试：

print is_duplicated_value( {1:'a', 2:'b', 3:{1:'c', 2:'a'}} )
print is_duplicated_value( {1:'a', 2:'b', 3:{1:'c', 2:'d'}} )

输出：

True
False

根据字典的使用和大小等，您可能希望将这些步骤重新转换为递归函数，该函数将每个值添加到set中，在添加和返回{{1}之前检查每个值是否在集合中}立即；如果字典已用尽，则True。

False

测试：

class Duplicated(ValueError): pass

def is_dup(d):
    values = set()
    def add(v):
        if isinstance(v, dict):
            map(add, v.values())
        else:
            if v in values:
                raise Duplicated
            else:
                values.add(v)
    try:
        add(d)
        return False
    except Duplicated:
        return True

输出：

print is_dup( {1:'a', 2:'b', 3:{1:'c', 2:'a'}} )
print is_dup( {1:'a', 2:'b', 3:{1:'c', 2:'d'}} )

如何在嵌套字典中查找重复或重复项？

5 个答案:

工作原理