Question

我正在寻找输出Python中两个数组的第一个差异的索引的最快方法。例如，让我们采用以下两个数组：

test1 = [1, 3, 5, 8]
test2 = [1]
test3 = [1, 3]

比较test1和test2，我想输出1，而test1和test3的比较应输出2。

换句话说，我寻找与声明相同的内容：

import numpy as np
np.where(np.where(test1 == test2, test1, 0) == '0')[0][0]

具有不同的数组长度。

感谢任何帮助。

Answer 1

对于列表，这有效：

from itertools import zip_longest

def find_first_diff(list1, list2):
    for index, (x, y) in enumerate(zip_longest(list1, list2, 
                                               fillvalue=object())):
        if x != y:
            return index

zip_longest使用None或提供的填充值填充较短的列表。如果差异是由不同的列表长度而不是列表中的实际不同值引起的，则标准zip不起作用。

在Python 2上使用izip_longest。

更新：创建了唯一的填充值，以避免None作为列表值的潜在问题。 object()是唯一的：

>>> o1 = object()
>>> o2 = object()
>>> o1 == o2
False

这种纯Python方法可能比NumPy解决方案更快。这取决于实际数据和其他情况。

将列表转换为NumPy数组也需要时间。实际上这可能使用上述函数查找索引需要更长的时间。如果你不是将NumPy数组用于其他计算，即转换可能会造成相当大的开销。
NumPy始终搜索完整数组。如果差异来得早，你做的工作比你需要的多得多。
NumPy创建了一堆中间数组。这会花费记忆和时间。
NumPy需要构造具有最大长度的中间数组。比较许多小型和非常大的阵列在这里是不利的。

通常，在许多情况下，NumPy比纯Python解决方案更快。但每个案例都有点不同，有些情况下是纯粹的 Python更快。

Answer 2

使用numpy数组（对于大数组会更快）然后你可以检查列表的长度然后（也）检查重叠部分如下所示（显然切片越长越短）： / p>

import numpy as np

n = min(len(test1), len(test2))
x = np.where(test1[:n] != test2[:n])[0]
if len(x) > 0:
  ans = x[0]
elif len(test1) != len(test2):
  ans = n
else:
  ans = None

编辑 - 尽管这被拒绝了，但我会在这里留下我的答案以防其他人需要做类似的事情。

如果起始数组很大而且是numpy，则这是最快的方法。此外，我不得不修改安迪的代码才能让它发挥作用。顺序：1。我的建议，2。Paidric（现已删除但最优雅），3。Andy接受的答案，4。拉链 - 非numpy，5。没有拉链的香草蟒蛇@leekaiinthesky

0.1ms ，9.6ms，0.6ms，2.8ms，2.3ms

如果转换为ndarray包含在timeit中，则非numpy nop-zip方法最快

7.1ms，17.1ms，7.7ms，2.8ms， 2.3ms

如果两个列表之间的差异在索引1,000而不是10,000

，则更是如此

7.1ms，17.1ms，7.7ms，0.3ms， 0.2ms

import timeit

setup = """
import numpy as np
from itertools import zip_longest
list1 = [1 for i in range(10000)] + [4, 5, 7]
list2 = [1 for i in range(10000)] + [4, 4]
test1 = np.array(list1)
test2 = np.array(list2)

def find_first_diff(l1, l2):
    for index, (x, y) in enumerate(zip_longest(l1, l2, fillvalue=object())):
        if x != y:
            return index

def findFirstDifference(list1, list2):
  minLength = min(len(list1), len(list2))
  for index in range(minLength):
    if list1[index] != list2[index]:
      return index
  return minLength
"""

fn = ["""
n = min(len(test1), len(test2))
x = np.where(test1[:n] != test2[:n])[0]
if len(x) > 0:
  ans = x[0]
elif len(test1) != len(test2):
  ans = n
else:
  ans = None""",
"""
x = np.where(np.in1d(list1, list2) == False)[0]
if len(x) > 0:
  ans = x[0]
else:
  ans = None""",
"""
x = test1
y = np.resize(test2, x.shape)
x = np.where(np.where(x == y, x, 0) == 0)[0]
if len(x) > 0:
  ans = x[0]
else:
  ans = None""",
"""
ans = find_first_diff(list1, list2)""",
"""
ans = findFirstDifference(list1, list2)"""]

for f in fn:
  print(timeit.timeit(f, setup, number = 1000))

Answer 3

最快的算法会将每个元素与第一个差异进行比较，而不是更多。因此，成对地迭代这两个列表会给你这个：

def findFirstDifference(list1, list2):
  minLength = min(len(list1), len(list2))
  for index in xrange(minLength):
    if list1[index] != list2[index]:
      return index
  return minLength # the two lists agree where they both have values, so return the next index

这给出了你想要的输出：

print findFirstDifference(test1, test3)
> 2

Answer 4

这是一种方法：

from itertools import izip
def compare_lists(lista, listb):
    """
    Compare two lists and return the first index where they differ. if
    they are equal, return the list len
    """
    for position, (a, b) in enumerate(zip(lista, listb)):
        if a != b:
            return position
    return min([len(lista), len(listb)])

算法很简单：zip（或者在这种情况下，效率更高izip）两个列表，然后逐个元素地比较它们。
eumerate函数给出索引位置，如果发现差异，我们可以返回
如果我们退出for循环而没有任何回报，可能会发生以下两种可能之一：
1. 这两个清单完全相同。在这种情况下，我们希望返回任一列表的长度。
2. 列表长度不同，它们等于较短列表的长度。在这种情况下，我们想要返回较短列表的长度
3. 这个函数有一个错误：如果你比较两个空列表，它会返回0，这似乎是错误的。我会留给你修理它作为练习。

Answer 5

感谢您提出的所有建议，我刚刚找到了一个更简单的方法来处理我的问题：

x = numpy.array(test1)
y = np.resize(numpy.array(test2), x.shape)
np.where(np.where(x == y, x, 0) == '0')[0][0]

Answer 6

这是一个公认的不是非常pythonic，没有numpy的刺：

b = zip (test1, test2)
c = 0
while b:        
    b = b[1:]
    if not b or b[0][0] != b[0][1]:
        break
    else:
        c = c + 1
print c

Answer 7

对于Python 3.x：

  def first_diff_index(ls1, ls2):
    l = min(len(ls1), len(ls2)) 
    return next((i for i in range(l) if ls1[i] != ls2[i]), l)

（对于Python 2.7以后，range）替换xrange

Python：最快的方式来比较数组元素

7 个答案: