我从数据库得到的第一组数据如下:
[
{u'ip': u'13.82.28.61', u'scanid': 1000, u'port': 443},
{u'ip': u'206.190.36.45', u'scanid': 1001, u'port': 80},
{u'ip': u'98.139.180.149', u'scanid': 1001, u'port': 80},
{u'ip': u'98.138.253.109', u'scanid': 1001, u'port': 80},
{u'ip': u'91.198.174.192', u'scanid': 1002, u'port': 110},
{u'ip': u'91.198.174.192', u'scanid': 1002, u'port': 31337}
]
我需要根据scanid
的数据,例如:
{
scanid : [{ip : [port1, port2 ...]}, {ip2 : [port3 ...]}],
scanid : [{ip3 : [port1, port2 ...]}, {ip4 : [port3 ...]}],
...
}
此处scanid
内的IP不应重复。例如,
{
1000: [{u'13.82.28.61': [443]}],
1001: [{u'206.190.36.45': [80]}, {u'98.139.180.149': [80]}, {u'98.138.253.109': [80]}],
1002: [{u'91.198.174.192': [110, 31337]}]
}
我尝试了以下代码:
d = defaultdict(list)
dictionary_with_scanid = defaultdict(list)
for rs in resultset:
scanid = rs['scanid']
domain = rs['ip']
port = rs['port']
d[domain].append(port)
dictionary_with_scanid[scanid].append({domain:d[domain]})
但我得到了scanid=1002
的重复数据:
{
1000: [{u'13.82.28.61': [443]}],
1001: [{u'206.190.36.45': [80]}, {u'98.139.180.149': [80]}, {u'98.138.253.109': [80]}],
1002: [{u'91.198.174.192': [110, 31337]}, {u'91.198.174.192': [110, 31337]}]
}
这是我的第二组数据,但scanid 1002
具有相同的重复数据:
1002: [{u'91.198.174.192': [110, 31337]}, {u'91.198.174.192': [110, 31337]}]
我希望下面的数据不具有重复性,无论是来自第一组数据还是来自第二组数据:
{
1000: [{u'13.82.28.61': [443]}],
1001: [{u'206.190.36.45': [80]}, {u'98.139.180.149': [80]}, {u'98.138.253.109': [80]}],
1002: [{u'91.198.174.192': [110, 31337]}]
}
答案 0 :(得分:0)
您的嵌套太多,只需为每个scanid
使用一个字典。我在这里使用setdefault
,但您也可以使用defaultdict
获得类似的结果:
data = ... # your original data
scans = {}
for d in data:
scans.setdefault(d['scanid'], {}).setdefault(d['ip'], []).append(d['port'])
print scans
结果:
{1000: {u'13.82.28.61': [443]},
1001: {u'206.190.36.45': [80],
u'98.139.180.149': [80],
u'98.138.253.109': [80]},
1002: {u'91.198.174.192': [110, 31337]}}
defaultdict
设置有点棘手,因为你需要嵌套它们;你需要传递外部字典一个自定义函数来构造内部函数:
from collections import defaultdict
scans = defaultdict(lambda: defaultdict(list))
for d in data:
scans[d['scanid']][d['ip']].append(d['port'])
答案 1 :(得分:0)
我只是让dictionary_with_scanid
成为词典的词典以避免IP重复:
d = defaultdict(list)
dictionary_with_scanid = defaultdict(dict) # use dict instead of list
for rs in resultset:
scanid = rs['scanid']
domain = rs['ip']
port = rs['port']
d[domain].append(port)
# just use the previously updated d[domain] for dictionary_with_scanid[scanid]
dictionary_with_scanid[scanid][domain] = d[domain]
它按预期给出:
pprint.pprint(dict(dictionary_with_scanid))
{1000: {'13.82.28.61': [443]},
1001: {'206.190.36.45': [80], '98.138.253.109': [80], '98.139.180.149': [80]},
1002: {'91.198.174.192': [110, 31337]}}