Question

我想优化代码并学习python速度（行为）。你能否展示一种最快的方法来比较两组/ dicts，以查找内部是否有重复。

我做了一些研究，但仍然不确定它是否是最终解决方案。

from timeit import Timer
import random

random.seed(1)
x = 10

a = dict(zip(random.sample(range(x), x), random.sample(range(x), x)))
b = dict(zip(random.sample(range(x), x), random.sample(range(x), x)))

def setCompare():
  return len(set(a) & set(b)) > 0

def setDisjointCompare():
  return set(a).isdisjoint(set(b))

def dictCompare():
  for i in a:
    if i in b:
      return False
  return True

print Timer(setCompare).timeit()
print Timer(setDisjointCompare).timeit()
print Timer(dictCompare).timeit()

目前的结果是：

3.95744682634
2.87678853039
0.762627652397

Answer 1

评论是正确的，你测量的不一致，我会告诉你原因。使用您当前的代码，我得到了类似的结果：

1.44653701782
  1.15708184242
  0.275780916214

如果我们将dictCompare()更改为以下内容：

def dictCompare():
    temp = set(b)
    for i in set(a):
        if i in temp:
            return False
        return True

我们得到了这个结果：

1.46354103088
  1.14659714699
  1.09220504761

这一次，它们都是相似的（并且很慢）因为大部分时间花在构建集合上。通过在前两种方法的时间中包含集合创建，而第三种方法利用现有对象，则引入了不一致性。

在您的评论中，您说您想要排除创建您要比较的对象所需的时间。所以让我们以一致的方式做到这一点：

# add this below the definitions of a and b
c = set(a)
d = set(b)

# change setCompare and setDisjointCompare()

def setCompare():
    return len(c & d) > 0

def setDisjointCompare():
    return c.isdisjoint(d)

# restore dictCompare() so it matches the OP

现在我们得到了这个结果：

0.518588066101
  0.196290016174
  0.269541025162

我们通过使所有三种方法都使用现有对象来平衡竞争环境。前两个使用现有的集合，第三个使用现有的字典。毫无疑问，内置方法（＃2）现在是最快的。但请记住，在使用之前我们必须花时间生成集合，因此即使isdisjoint()方法最快，将字典更改为仅用于比较的集合实际上会比第三种方法慢，如果我们想要的只是首先进行字典比较。

还有一个选项，类似于评论中的建议：

def anyCompare():
    return not any(k in b for k in a)
# side note: we want to invert the result because we want to return false once
# we find a common element

将此作为第四种方法添加此结果：

0.511568069458
  0.196676969528
  0.268508911133
  0.853673934937

不幸的是，这似乎比其他人慢，这令我感到惊讶。据我所知any()短路的方式与我的显式循环相同（根据the docs，所以我不知道我们在显式循环中如何更快。我怀疑短-circuit可能会在any()调用后发生，因为我们在最后反转结果，而不是在循环中发生否定，我们可以在遇到错误条件时立即返回。

在这些选项中，dictCompare()中的显式循环似乎是检查词典中是否存在重叠键的最快方法。

顺便说一句，你正在使用的第一种方法也需要将其结果反转为与其他方法保持一致，假设你想在重叠时返回False，与isdisjoint()相同的方式。

Python快速检查set a中的任何项是否在set b中

1 个答案: