从字典中删除重复项

时间:2012-01-05 20:17:23

标签: python dictionary duplicates

我有以下Python 2.7字典数据结构(我不控制源数据 - 来自另一个系统):

{112762853378: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4'], 
    'alias': ['www.example.com']
   },
 112762853385: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4'], 
    'alias': ['www.example.com']
   },
 112760496444: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4']
   },
 112760496502: 
   {'dst': ['10.122.195.34'], 
    'src': ['4.3.2.1']
   },
 112765083670: ...
}

字典键始终是唯一的。 Dst,src和别名可以是重复的。所有记录将始终具有dst和src,但并非每条记录都必须具有第三条记录中显示的别名。

在样本数据中,前两个记录中的任何一个都将被删除(对我来说无关紧要)。第三条记录将被认为是唯一的,因为虽然dst和src是相同的,但它缺少别名。

我的目标是删除dst,src和别名全部重复的所有记录 - 无论密钥是什么。

这个菜鸟如何实现这个目标?

另外,我对Python的有限理解将数据结构解释为字典,其值存储在字典中......这是一个dicts的字典,这是正确的吗?

11 个答案:

答案 0 :(得分:37)

如果值不在结果字典中,您可以浏览字典中的每个项目(键值对)并将它们添加到结果字典中。

input_raw = {112762853378: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4'], 
    'alias': ['www.example.com']
   },
 112762853385: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4'], 
    'alias': ['www.example.com']
   },
 112760496444: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4']
   },
 112760496502: 
   {'dst': ['10.122.195.34'], 
    'src': ['4.3.2.1']
   }
}

result = {}

for key,value in input_raw.items():
    if value not in result.values():
        result[key] = value

print result

答案 1 :(得分:3)

一种简单的方法是使用每个内部字典中的字符串数据的串联作为关键字来创建反向字典。所以说你在字典中有上述数据,d

>>> import collections
>>> reverse_d = collections.defaultdict(list)
>>> for key, inner_d in d.iteritems():
...     key_str = ''.join(inner_d[k][0] for k in ['dst', 'src', 'alias'] if k in inner_d)
...     reverse_d[key_str].append(key)
... 
>>> duplicates = [keys for key_str, keys in reverse_d.iteritems() if len(keys) > 1]
>>> duplicates
[[112762853385, 112762853378]]

如果你不想要一个重复列表或类似的东西,但只想创建一个无副本的字典,你可以只使用常规字典而不是defaultdict并重新反转它这样:

>>> for key, inner_d in d.iteritems():
...     key_str = ''.join(inner_d[k][0] for k in ['dst', 'src', 'alias'] if k in inner_d)
...     reverse_d[key_str] = key
>>> new_d = dict((val, d[val]) for val in reverse_d.itervalues())

答案 2 :(得分:3)

另一个反向字典变体:

>>> import pprint
>>> 
>>> data = {
...   112762853378: 
...    {'dst': ['10.121.4.136'], 
...     'src': ['1.2.3.4'], 
...     'alias': ['www.example.com']
...    },
...  112762853385: 
...    {'dst': ['10.121.4.136'], 
...     'src': ['1.2.3.4'], 
...     'alias': ['www.example.com']
...    },
...  112760496444: 
...    {'dst': ['10.121.4.136'], 
...     'src': ['1.2.3.4']
...    },
...  112760496502: 
...    {'dst': ['10.122.195.34'], 
...     'src': ['4.3.2.1']
...    },
... }
>>> 
>>> keep = set({repr(sorted(value.items())):key
...             for key,value in data.iteritems()}.values())
>>> 
>>> for key in data.keys():
...     if key not in keep:
...         del data[key]
... 
>>> 
>>> pprint.pprint(data)
{112760496444L: {'dst': ['10.121.4.136'], 'src': ['1.2.3.4']},
 112760496502L: {'dst': ['10.122.195.34'], 'src': ['4.3.2.1']},
 112762853378L: {'alias': ['www.example.com'],
                 'dst': ['10.121.4.136'],
                 'src': ['1.2.3.4']}}

答案 3 :(得分:3)

input_raw = {112762853378:  {'dst': ['10.121.4.136'],
                             'src': ['1.2.3.4'],
                             'alias': ['www.example.com']    },
             112762853385:  {'dst': ['10.121.4.136'],
                             'src': ['1.2.3.4'],
                             'alias': ['www.example.com']    },
             112760496444:  {'dst': ['10.121.4.299'],
                             'src': ['1.2.3.4']    },
             112760496502:  {'dst': ['10.122.195.34'],
                             'src': ['4.3.2.1']    },
             112758601487:  {'src': ['1.2.3.4'],
                             'alias': ['www.example.com'],
                             'dst': ['10.121.4.136']},
             112757412898:  {'dst': ['10.122.195.34'],
                             'src': ['4.3.2.1']    },
             112757354733:  {'dst': ['124.12.13.14'],
                             'src': ['8.5.6.0']},             
             }

for x in input_raw.iteritems():
    print x
print '\n---------------------------\n'

seen = []

for k,val in input_raw.items():
    if val in seen:
        del input_raw[k]
    else:
        seen.append(val)


for x in input_raw.iteritems():
    print x

结果

(112762853385L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.136'], 'alias': ['www.example.com']})
(112757354733L, {'src': ['8.5.6.0'], 'dst': ['124.12.13.14']})
(112758601487L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.136'], 'alias': ['www.example.com']})
(112757412898L, {'src': ['4.3.2.1'], 'dst': ['10.122.195.34']})
(112760496502L, {'src': ['4.3.2.1'], 'dst': ['10.122.195.34']})
(112760496444L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.299']})
(112762853378L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.136'], 'alias': ['www.example.com']})

---------------------------

(112762853385L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.136'], 'alias': ['www.example.com']})
(112757354733L, {'src': ['8.5.6.0'], 'dst': ['124.12.13.14']})
(112757412898L, {'src': ['4.3.2.1'], 'dst': ['10.122.195.34']})
(112760496444L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.299']})

此解决方案首先创建了一个列表 input_raw.iteritems()(如Andrew的Cox答案中所示),并且需要增加列表可见,这些都是缺点。
但是第一个是无法避免的(使用iteritems()不起作用)第二个不如重新创建列表 result.values()从增长列表结果< / strong>循环的每一个回合。

答案 4 :(得分:2)

由于在对应关系中找到唯一性的方法正是使用字典,所需的唯一值是关键字,所以要创建一个颠倒的字典,其中您的值被组成键 - 然后重新创建一个使用中间结果“去掉”字典。

dct = {112762853378: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4'], 
    'alias': ['www.example.com']
   },
 112762853385: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4'], 
    'alias': ['www.example.com']
   },
 112760496444: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4']
   },
 112760496502: 
   {'dst': ['10.122.195.34'], 
    'src': ['4.3.2.1']
   },
   }

def remove_dups (dct):
    reversed_dct = {}
    for key, val in dct.items():
        new_key = tuple(val["dst"]) + tuple(val["src"]) + (tuple(val["alias"]) if "alias" in val else (None,) ) 
        reversed_dct[new_key] = key
    result_dct = {}
    for key, val in reversed_dct.items():
        result_dct[val] = dct[val]
    return result_dct

result = remove_dups(dct)

答案 5 :(得分:2)

dups={}

for key,val in dct.iteritems():
    if val.get('alias') != None:
        ref = "%s%s%s" % (val['dst'] , val['src'] ,val['alias'])# a simple hash
        dups.setdefault(ref,[]) 
        dups[ref].append(key)

for k,v in dups.iteritems():
    if len(v) > 1:
        for key in v:
            del dct[key]

答案 6 :(得分:1)

from collections import defaultdict

dups = defaultdict(lambda : defaultdict(list))

for key, entry in data.iteritems():
    dups[tuple(entry.keys())][tuple([v[0] for v in entry.values()])].append(key)

for dup_indexes in dups.values():
    for keys in dup_indexes.values():
        for key in keys[1:]:
            if key in data:
                del data[key]

答案 7 :(得分:0)

我只创建一组键列表,然后将它们遍历为新的字典:

input_raw = {112762853378: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4'], 
    'alias': ['www.example.com']
   },
 112762853385: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4'], 
    'alias': ['www.example.com']
   },
 112760496444: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4']
   },
 112760496502: 
   {'dst': ['10.122.195.34'], 
    'src': ['4.3.2.1']
   }
}

filter = list(set(list(input_raw.keys())))

fixedlist = {}

for i in filter:
    fixedlist[i] = logins[i]

答案 8 :(得分:0)

我使用压缩字典方法解决了它:

dic = {112762853378: 
    {'dst': ['10.121.4.136'], 
     'src': ['1.2.3.4'], 
     'alias': ['www.example.com']
    },
112762853385: 
    {'dst': ['10.121.4.136'], 
     'src': ['1.2.3.4'], 
     'alias': ['www.example.com']
    },
112760496444: 
    {'dst': ['10.121.4.136'], 
     'src': ['1.2.3.4']
    },
112760496502: 
    {'dst': ['10.122.195.34'], 
     'src': ['4.3.2.1']
    }
}

result = {k:v for k,v in dic.items() if list(dic.values()).count(v)==1}

答案 9 :(得分:-1)

您可以使用

set(dictionary) 

解决您的问题。

答案 10 :(得分:-3)

example = {
    'id1':  {'name': 'jay','age':22,},
    'id2': {'name': 'salman','age': 52,},
    'id3': {'name':'Ranveer','age' :26,},
    'id4': {'name': 'jay', 'age': 22,},
}
for item in example:
    for value in example:
        if example[item] ==example[value]:
            if item != value:
                 key = value 
                 del example[key]
print "example",example