Question

鉴于两个列表都是按数字排序的，所以我希望找到不在A中的所有B元素。我了解在python中有很多方法可以使此操作变得简单（例如，使用setdiff（）），但是我正在寻找一种使用移动索引标志的更具体的方法。

如果我们对所有元素进行全面比较，这是最简单的方法。

def exclude_list(list_a, list_b):
    ret_list = []
    for element_b in list_b:
        if element_b not in list_a:
            ret_list.append(element_b)
    return ret_list

我希望使用移动索引方法，即使用“指针” idx_a和idx_b。由于两个列表都已排序，因此如果list_b [idx_b]

def exclude_list_fast(list_a, list_b):
    ret_list = []

    # 3 scenarios,
    #   1. list_b[idx_b] < list_a[idx_a], immediately add into ret_list.
    #        - idx_b += 1
    #        - no change to idx_a
    #   2. list_b[idx_b] = list_a[idx_a], item is found.
    #       - idx_b += 1
    #       - no change to idx_a
    #   3. list_b[idx_b] > list_a[idx_a], item may still be ahead.
    #       - idx_a += 1
    #       - no change to idx_b
    #       - compare again until result falls within the first 2 cases

    idx_a = 0

    for idx_b in range(len(list_b)):

        # If idx_a has already reached max length, add to the ret_list.
        if idx_a == len(list_a)-1:
            ret_list.append(list_b[idx_b])
        elif list_b[idx_b] < list_a[idx_a]:
            ret_list.append(list_b[idx_b])
        elif list_b[idx_b] == list_a[idx_a]:
            continue
        elif list_b[idx_b] > list_a[idx_a]:
            while list_b[idx_b] > list_a[idx_a] and idx_a < (len(list_a)-1):
                idx_a += 1
                if list_b[idx_b] < list_a[idx_a]:
                    ret_list.append(list_b[idx_b])
                elif list_b[idx_b] == list_a[idx_a]:
                    break

    return ret_list

很想知道是否存在使用移动索引的更优雅，计算效率更高的解决方案。我会感谢任何形式的建设性指导。

Answer 1

您可以使用set方法很容易地做到这一点。在这种特殊情况下，difference方法可以为您提供帮助：

>>> l1 = ['a', 'b', 'c', 'd']
>>> l2 = ['a', 'c', 'x', 'y']
>>> set(l2).difference(l1)
set(['y', 'x'])

如果您绝对需要一个列表，则可以强制转换结果：

>>> list(set(l2).difference(l1))
['y', 'x']

Answer 2

好吧，现在您的问题是in的列表具有线性时间或O(n)，并且由于对两个列表进行了排序，因此可以使用以下算法进行操作：

#1)loop over B
#2)look for B[i] using Binary Search because it is sorted, and save the index in indexA
#3)next item in B, look for it again in Binary search in list_a but after last index where you last left
def exclude_list(list_a, list_b):
    ret_list = []
    start_in_list_a = 0 
    for element_b in list_b:
        index_of_item_in_list_b = binary_search(element_b,list_a,start_in_list_a )
        if index_of_item_in_list_b == -1:
            ret_list.append(element_b)
        else:
            start_in_list_a = index_of_item_in_list_b 
    return ret_list

Answer 3

如果可以使用生成器函数产生结果而不是返回list，则可以非常简单而优雅地完成此操作：

def iterdiff(a, b):
    """Yields values from b that are not present in a.

    Both a and b must be sorted iterables of comparable types.
    """
    exhausted = object()
    existing = iter(a)
    next_existing = next(existing, exhausted)
    for candidate in b:
        while next_existing is not exhausted and next_existing < candidate:
            next_existing = next(existing, exhausted)
        if next_existing is exhausted or next_existing > candidate:
            yield candidate

此实现以O（n + m）的时间运行，并保证只对b进行一次迭代，而对a进行一次迭代。它还适用于任何可迭代的对象，而不仅仅是列表。

如果您确实想返回一个list对象，则可以轻松地对其进行修改以建立结果，但是我认为生成器形式要优雅得多。

解释其工作原理：

我们以迭代器对象的形式为两个输入可迭代器保留两个“指针”：for循环中的隐式迭代器在b上进行迭代，而显式迭代器对象{{1} }正在existing上进行迭代。

我们的主循环遍历了a。在每次迭代中，我们需要决定从b产生对象（如果它是唯一的）还是不产生（如果它复制b中的对象）。

如果existing位于existing对象的“后面”（小于）candidate，则我们继续在a中前进，直到找到等于或大于{{ 1}}（或直到我们到达candidate的结尾）。

如果a在existing对象的“前面”（大于）candidate，则产生candidate，因为它不能存在于a中。（如果它在a中，我们将已经到达它，并且由于两个列表都已排序，因此我们知道如果不考虑该候选者就无法到达它。）我们将继续从{{1} }，直到我们赶上b迭代器的值为止（或直到我们到达existing的末尾为止）。

如果b的值等于我们的existing对象，则不会产生candidate，因此将其从结果中省略。

Answer 4

您的方法似乎是正确的，但可以进一步简化。您可以直接遍历一个列表，同时保持对另一个列表的索引：

i = 0
for x in a:
    # skip entries in b that are smaller than x
    while i < len(b) and b[i] < x:
        i += 1
    # if we moved past x, it's not in b, and can be emitted.
    if b[i] > x:
        yield x

严格来说，另一个也可以进行迭代，不需要直接通过索引访问列表。这需要显式的iter()和next()调用，并捕获StopIteration异常。（此代码的优点是，两个参数都可以是任意可迭代的值，而不必是列表。）

ib = iter(b)
try:
    y = next(ib)
except StopIteration:
    y = None
for x in a:
    try:
        # skip entries in b that are smaller than x
        while y < x:
            y = next(ib)
    except StopIteration:
        pass
    # if we moved past x, it's not in b, and can be emitted.
    if y != x:
        yield x

在两种情况下，代码都返回一个生成器，但是可以通过在其上调用list()轻松地将其转换为列表。

给定2个排序的列表A和B，找到B中不在A

4 个答案: