Question

我想在python中实现一个程序，其中所有出现的值都从列表中移除并以线性时间（O（n））移除。有一些类似的问题，但没有一个有线性时间的限制。

我设法提出了一个解决方案，但我正在寻找一个更简单，更清晰的版本（或者最好是一个完全不同的实现）。

下面我使用两个“指针”（后面和前面）来遍历列表。两者都从0开始并遍历列表，直到它们达到我们想要删除的值。此时，前指针检查后面的整数是否也是我们要删除的值。前指针继续遍历，直到我们达到第一个整数，这不是我们想要的值。然后我交换价值并继续这个过程。

def remove_all(lst, val):
    back = 0
    front = 0
    while (front < len(lst)-1):
        while(front < len(lst) and lst[front] != val):
            front += 1
            back += 1
        if (front < len(lst)-1 and lst[front] == val):
            while(front < len(lst)-1 and lst[front] == val):
                front += 1
            lst[back], lst[front] = lst[front], lst[back]
            back += 1
    if (lst[-1] != val):
        raise ValueError('Value not present in the list')
    while(len(lst) != 0 and lst[-1] == val):
        lst.pop()

Answer 1

我不确定你在做什么，但是你让它复杂化了。有两种方法可以做到。

选项1
返回一个新列表。这可以使用列表comp或循环来完成。

def remove_all(lst, val):
    return [x for x in lst if x != val]

或者，

def remove_all_gen(lst, val):
    for i in lst:
        if i != val:
            yield i

print(list(remove_all([1, 1, 2, 2, 3, 1, 2], 1)))
[2, 2, 3, 2]

这两种解决方案都是线性的。

选项2
要修改列表，可以使用del。

反向迭代

def remove_all(lst, val):
   for i in range(len(lst) - 1, -1, -1):
       if lst[i] == val:
          del lst[i]

l = [1, 1, 2, 2, 3, 1, 2]
remove_all(l, 1)

print(l)
[2, 2, 3, 2]

虽然此解决方案修改了列表，但它不再是线性的（感谢评论），因为del是一个线性操作，需要移动元素。

作为一种良好的做法，如果您正在执行就地变异，请不要从函数中返回任何内容。

Answer 2

通常，您会使用列表推导，生成器表达式或filter函数，但由于您的老师坚持就地处理，因此您无法使用它们。尽管我讨厌副作用，但你在这里：

def remove_occurences(lst, val):
    # using filter in Python 3 (or ifilter in Python 2) requires O(1) memory;
    # it's not usually a good thing to update the object you are iterating,
    # but here we can be sure, nothing will go wrong: we can never overwrite
    # a value we must take before getting hold of its reference
    i = -1
    for i, value in enumerate(filter(lambda x: x != val, lst)):
        lst[i] = value
    # remove the tail
    del lst[i+1:len(lst)]
    return lst

由于我们正在处理命令性代码，因此命令式进行测试

from hypothesis import strategies as st
from hypothesis import given

@given(st.lists(elements=st.integers(0, 9), min_size=0, average_size=50, max_size=100),
       st.integers(0, 9))
def test(lst, val):                                                          
    assert remove_occurences(lst[:], val) == list(filter(lambda x: x != val, lst))

调用test()将运行数百次随机测试。该功能已通过测试。现在，由于列表上的del平均需要O(n)，我们必须确保我们的尾部攻击是一个特殊情况（这很可能取决于实现，但我猜CPython背后的人很聪明足以在很久以前对其进行优化）。让我们运行一些基准测试：

In [45]: %timeit remove_occurences([1, 2, 3, 4, 5]*1, 3)
1.31 µs ± 30.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [46]: %timeit remove_occurences([1, 2, 3, 4, 5]*10, 3)
6.9 µs ± 243 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [47]: %timeit remove_occurences([1, 2, 3, 4, 5]*100, 3)
68.3 µs ± 5.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [48]: %timeit remove_occurences([1, 2, 3, 4, 5]*1000, 3)
733 µs ± 54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [49]: %timeit remove_occurences([1, 2, 3, 4, 5]*10000, 3)
7.07 ms ± 295 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

增长明显是线性的。考虑到source（参见评论）

，这实际上是预期的

...
    if (value == NULL) {
        /* delete slice */
        PyObject **garbage;
        size_t cur;
        Py_ssize_t i;
        int res;

        if (slicelength <= 0)
            return 0;

        if (step < 0) {
            stop = start + 1;
            start = stop + step*(slicelength - 1) - 1;
            step = -step;
        }

        garbage = (PyObject**)
            PyMem_MALLOC(slicelength*sizeof(PyObject*));
        if (!garbage) {
            PyErr_NoMemory();
            return -1;
        }

        /* drawing pictures might help understand these for
           loops. Basically, we memmove the parts of the
           list that are *not* part of the slice: step-1
           items for each item that is part of the slice,
           and then tail end of the list that was not
           covered by the slice */
...

由于我们正在删除尾部，因此无需移动任何东西，使其成为线性操作。

Answer 3

怎么样：

def remove_all(l, item):
    try:
        while True:
            i = l.index(item)
            l.pop(i)
    except ValueError:
        return l

在线性时间内从列表中删除所有出现的值

3 个答案: