Question

我目前要求使用Python 2.7对包含MAC地址的字符串进行比较（例如“11：22：33：AA：BB：CC”。目前，我有一个预先配置的包含MAC地址的集合我的脚本遍历集合，将每个新的MAC地址与列表中的MAC地址进行比较。这很有效，但随着集合的增长，脚本会大幅减速。只有100左右，你会注意到一个巨大的差异。

有人对加快这个过程有什么建议吗？将它们存储在一组中是比较的最佳方式，还是将它们存储在CSV / DB中更好？

代码示例......

def Detect(p): 
    stamgmtstypes = (0,2,4)
    if p.haslayer(Dot11):
        if p.type == 0 and p.subtype in stamgmtstypes:
            if p.addr2 not in observedclients: 
                # This is the set with location_mutex: 
                detection = p.addr2 + "\t" + str(datetime.now())
                print type(p.addr2)
                print detection, last_location
                observedclients.append(p.addr2)

Answer 1

首先，您需要profile your code来了解瓶颈究竟在哪里......

此外，作为通用建议，请考虑psyco，尽管there are a few times when psyco doesn't help

一旦找到瓶颈，cython可能会有用，但您需要确保在cython源中声明所有变量。

Answer 2

尝试使用set。要声明设置使用set()，而不是[]（因为后者声明为空list）。

list中的查找具有O(n)复杂度。当列表增长时，会发生这种情况（随着n O(n)的增长，复杂性会增加。

set中的查询平均复杂度为O(1)。

http://wiki.python.org/moin/TimeComplexity

此外，您还需要更改代码的某些部分。 append中没有set方法，因此您需要使用observedclients.add(address)之类的内容。

Answer 3

帖子提到“脚本遍历集合，将每个新MAC地址与列表中的MAC地址进行比较。”

要充分利用集合，请不要在逐个比较的情况下循环它们。而是使用 union（）， intersection（）和 difference（）等集合操作：

s = set(list_of_strings_containing_mac_addresses)
t = set(preconfigured_set_of_mac_addresses)
print s - t, 'addresses in the list but not preconfigured'

在Python中检查数据集中的信息

3 个答案: