在python中,如何删除字典中字典列表的重复值?

时间:2017-08-22 09:54:02

标签: python dictionary

我从数据库得到的第一组数据如下:

[
    {u'ip': u'13.82.28.61', u'scanid': 1000, u'port': 443},
    {u'ip': u'206.190.36.45', u'scanid': 1001, u'port': 80},
    {u'ip': u'98.139.180.149', u'scanid': 1001, u'port': 80},
    {u'ip': u'98.138.253.109', u'scanid': 1001, u'port': 80},
    {u'ip': u'91.198.174.192', u'scanid': 1002, u'port': 110},
    {u'ip': u'91.198.174.192', u'scanid': 1002, u'port': 31337}
]

我需要根据scanid的数据,例如:

{
    scanid : [{ip : [port1, port2 ...]}, {ip2 : [port3 ...]}],
    scanid : [{ip3 : [port1, port2 ...]}, {ip4 : [port3 ...]}],
    ...
}

此处scanid内的IP不应重复。例如,

{
    1000: [{u'13.82.28.61': [443]}],
    1001: [{u'206.190.36.45': [80]}, {u'98.139.180.149': [80]}, {u'98.138.253.109': [80]}], 
    1002: [{u'91.198.174.192': [110, 31337]}]
}

我尝试了以下代码:

d = defaultdict(list)
dictionary_with_scanid = defaultdict(list)
for rs in resultset:
    scanid = rs['scanid']
    domain = rs['ip']       
    port = rs['port']
    d[domain].append(port)
    dictionary_with_scanid[scanid].append({domain:d[domain]})

但我得到了scanid=1002的重复数据:

{
    1000: [{u'13.82.28.61': [443]}],
    1001: [{u'206.190.36.45': [80]}, {u'98.139.180.149': [80]}, {u'98.138.253.109': [80]}], 
    1002: [{u'91.198.174.192': [110, 31337]}, {u'91.198.174.192': [110, 31337]}]
}

这是我的第二组数据,但scanid 1002具有相同的重复数据:

1002: [{u'91.198.174.192': [110, 31337]}, {u'91.198.174.192': [110, 31337]}]

我希望下面的数据不具有重复性,无论是来自第一组数据还是来自第二组数据:

{
    1000: [{u'13.82.28.61': [443]}],
    1001: [{u'206.190.36.45': [80]}, {u'98.139.180.149': [80]}, {u'98.138.253.109': [80]}], 
    1002: [{u'91.198.174.192': [110, 31337]}]
}

2 个答案:

答案 0 :(得分:0)

您的嵌套太多,只需为每个scanid使用一个字典。我在这里使用setdefault,但您也可以使用defaultdict获得类似的结果:

data = ... # your original data 
scans = {}
for d in data:
    scans.setdefault(d['scanid'], {}).setdefault(d['ip'], []).append(d['port'])
print scans

结果:

{1000: {u'13.82.28.61': [443]}, 
 1001: {u'206.190.36.45': [80], 
        u'98.139.180.149': [80], 
        u'98.138.253.109': [80]}, 
 1002: {u'91.198.174.192': [110, 31337]}}

defaultdict设置有点棘手,因为你需要嵌套它们;你需要传递外部字典一个自定义函数来构造内部函数:

from collections import defaultdict
scans = defaultdict(lambda: defaultdict(list))
for d in data:
    scans[d['scanid']][d['ip']].append(d['port'])

答案 1 :(得分:0)

我只是让dictionary_with_scanid成为词典的词典以避免IP重复:

d = defaultdict(list)
dictionary_with_scanid = defaultdict(dict)  # use dict instead of list
for rs in resultset:
    scanid = rs['scanid']
    domain = rs['ip']
    port = rs['port']
    d[domain].append(port)
    # just use the previously updated d[domain] for dictionary_with_scanid[scanid]
    dictionary_with_scanid[scanid][domain] = d[domain]

它按预期给出:

pprint.pprint(dict(dictionary_with_scanid))
{1000: {'13.82.28.61': [443]},
 1001: {'206.190.36.45': [80], '98.138.253.109': [80], '98.139.180.149': [80]},
 1002: {'91.198.174.192': [110, 31337]}}