Python: Dedup a dictionary

时间:2016-08-31 18:29:53

标签: python scapy

Good afternoon,

I reading in a pcap and am basically trying to get a dedup'd list of BSSID's & ESSID's. I am still getting duplicates with this code and cannot for the life of me figure out what I am doing wrong:

if not (t[0] in ssid_pairs and ssid_pairs[t[0]] == t[1]):
    ssid_pairs[t[0]] = t[1]
    of.write(t[0] + ',' + t[1] + ((',' + f + '\n') if verbose else '\n'))

ssid_pairs is a dictionary, t[0] is the bssid & t[1] is the essid. An example of the dictionary is:

{'FF:FF:FF:FF:FF:FF':'MyWIFI',...}

I am still seeing multiple instances of the same key->value pair being written to the file. I put some debugging print statements in and sometimes it will recognize a duplicate, sometimes it will not. This is from a parse pcap with scapy.

Thanks for any help.

*** EDIT: Thanks everyone, I am not really solving my problem the right way with a dictionary. Time to think this through a bit clearer...

2 个答案:

答案 0 :(得分:1)

Dictionaries can't have duplicates:

some_data = [('foo', 'bar'),
             ('bang', 'quux'),
             ('foo', 'bar'),
             ('zappo', 'whoo'),
             ]

mydict = {}
for data in some_data:
    mydict[data[0]] = data[1]

import pprint; print(mydict)

The only way you're going to re-write the same data is if you aren't opening your file in 'w' mode. But

with open('outfile.txt', 'w') as of:
    for key in mydict:
        of.write('{},{}{}'.format(key, mydict[key], (',' + f + '\n') if verbose else '\n'))

Will never write the same line twice.

答案 1 :(得分:1)

Let's say you get:

t[0] = 'foo'
t[1] = 'bar'

Then we hit your code above:

if not (t[0] in ssid_pairs and ssid_pairs[t[0]] == t[1]):
    ssid_pairs[t[0]] = t[1]
    of.write(t[0] + ',' + t[1] + ((',' + f + '\n') if verbose else '\n'))

The condition passes (because t[0] is not in ssid_pairs), so we set:

ssid_pairs[t[0]] = t[1]

Which gives us:

ssid_pairs = {
  'foo': 'bar',
}

In our next iteration of the loop, we read:

t[0] = 'foo'
t[1] = 'gizmo'

Your condition passes (because ssid_pairs[t[0]] != t[1]), so we set:

ssid_pairs[t[0]] = t[1]

Which gives us:

ssid_pairs = {
  'foo': 'gizmo',
}

Then we read the same data we encountered in our first iteration:

t[0] = 'foo'
t[1] = 'bar'

But because we just ssid_pairs['foo'] = 'gizmo', your condition will pass again, and you will once again write out the same data.

If you are trying to get a list of unique pairs, then maybe create a set of (essid,bssid) tuples:

seen_pairs = set()
...
if not ((t[0],t(1)) in seen_pairs):
    seen_pairs.add((t[0], t[1]))
    of.write(t[0] + ',' + t[1] + ((',' + f + '\n') if verbose else '\n'))