Question

我正在比较两个列表，并在存在匹配项时删除第一个重复的实例，然后继续。我知道这些列表之间有很多重复项，所以我不能只使用列表推导或类似的方法，因为我需要查看哪一方有更多实例，我基本上只是从两个列表中减去共享元素。

这里是我的代码：

toDelFromrbIndex = []
toDelFromabIndex = []
for rbIndex, (barcode, timestamp, prepack, workorder) in enumerate(restoredBottles):
    for abIndex, (idx, bcode, tstamp, tableName) in enumerate(allBottles):
         if barcode==bcode and timestamp == tstamp:
             #Remove from both lists
             toDelFromrbIndex.append(rbIndex)
             toDelFromabIndex.append(abIndex)

 for index in toDelFromrbIndex:
     del restoredBottles[index]

 for index in toDelFromabIndex:
     del allBottles[index]

在此之前，我在“ toDelFromrbIdnex.append（rbIndex）”所在的位置删除了它们，并意识到这弄乱了我的迭代，可能会跳过一些项目。因此，我先存储索引，然后再将其全部从两个列表中删除。

但是，这个for index in toDelFromrbIdnex: del restoredBottles[index]给我一个index out of range错误，为什么？

Answer 1

您正在从最小到最大删除索引。每个删除操作都会将元素移到已删除索引的右侧，向下移动一级，因此索引N处的内容将移至N-1。

最后，您尝试删除的最后一个索引现在可能指向列表之外。以下还会引发IndexError：

foo = [17, 42]
for index in (0, 1):
    del foo[index]

因为首先我们在索引17处删除0。删除第一个元素意味着42然后成为索引0的元素，索引1不再有任何内容。

您需要先删除最高索引，所以要反向处理索引：

 for index in reversed(toDelFromrbIdnex):
     del restoredBottles[index]

 for index in sorted(toDelFromabIdnex, reverse=True):
     del allBottles[index]

我对toDelFromabIndnex进行了排序，因为您最终可以按任意顺序向其中添加ID。

另外要注意的是：您当前对“瓶子”的匹配效率很低。您正在使用嵌套循环，因此对于N restoredBottles个条目和M allBottles个条目，您要进行O（NM）测试。随着这两个列表的增加，运行时间将成倍增加。例如，对于N = 100和M = 1000，您进行100.000个比较，对于N = 200，则进行200.000个比较，或者将M更改为5000，则需要进行500.000个比较。

如果您使用中间词典，则可以将其简化为O（N + M）个步骤：

# mapping from barcode and timestamp, to index in restoredBottles
bcts_idx = {}
for i, (bc, ts, *_) in enumerate(restoredBottles)
    bcts_idx.setdefault((bc, ts), []).append(i)

toDelFromrbIndex = []
toDelFromabIndex = []
for abIndex, (idx, bcode, tstamp, tableName) in enumerate(allBottles):
    for rbIndex in bcts_idx.get((bcode, tstamp), ()):
        # Remove from both lists
        toDelFromrbIndex.append(rbIndex)
        toDelFromabIndex.append(abIndex)

使用索引列表从另一个列表中删除会导致索引超出范围错误-为什么？

1 个答案: