这是对类似question的跟进,它询问了最好的写作方式
for item in somelist:
if determine(item):
code_to_remove_item
似乎共识就像是
somelist[:] = [x for x in somelist if not determine(x)]
但是,我认为如果您只删除一些项目,大多数项目都会被复制到同一个对象中,这可能很慢。在answer到另一个相关的question,有人建议:
for item in reversed(somelist):
if determine(item):
somelist.remove(item)
但是,这里list.remove
将搜索项目,即列表长度为O(N)。可能我们受到限制,因为列表表示为数组而不是链表,因此删除项目后需要移动所有内容。但是,建议here collections.dequeue表示为双向链表。然后可以在迭代时在O(1)中移除。我们将如何实现这一目标?
更新: 我还做了一些时间测试,使用以下代码:
import timeit
setup = """
import random
random.seed(1)
b = [(random.random(),random.random()) for i in xrange(1000)]
c = []
def tokeep(x):
return (x[1]>.45) and (x[1]<.5)
"""
listcomp = """
c[:] = [x for x in b if tokeep(x)]
"""
filt = """
c = filter(tokeep, b)
"""
print "list comp = ", timeit.timeit(listcomp,setup, number = 10000)
print "filtering = ", timeit.timeit(filt,setup, number = 10000)
得到了:
list comp = 4.01255393028
filtering = 3.59962391853
答案 0 :(得分:16)
列表理解是渐近最优解:
somelist = [x for x in somelist if not determine(x)]
它只对列表进行一次传递,因此在O(n)时间运行。由于您需要在每个对象上调用determine(),因此任何算法都至少需要O(n)次操作。列表推导确实需要进行一些复制,但它只是复制对象的引用,而不是复制对象本身。
从Python中的列表中删除项目是O(n),因此循环中包含remove,pop或del的任何内容都将为O(n ** 2)。
此外,在CPython中,列表理解比循环更快。
答案 1 :(得分:3)
如果你需要删除O(1)中的项目,你可以使用HashMaps
答案 2 :(得分:3)
由于list.remove
相当于del list[list.index(x)]
,您可以这样做:
for idx, item in enumerate(somelist):
if determine(item):
del somelist[idx]
但是:你应该在迭代它时不修改列表。它迟早会咬你的。首先使用filter
或列表推导,然后再进行优化。
答案 3 :(得分:3)
deque优化用于头部和尾部移除,而不是在中间任意移除。删除本身很快,但您仍然需要遍历列表到删除点。如果你在整个长度上进行迭代,那么过滤双端队列和过滤列表(使用filter
或理解)之间的唯一区别就是复制的开销,最坏的情况是复数;它仍然是O(n)操作。另请注意,列表中的对象不会被复制 - 只是对它们的引用。所以开销不是那么多。
你有可能避免像这样复制,但我没有特别的理由相信这比简单的列表理解更快 - 它可能不是:
write_i = 0
for read_i in range(len(L)):
L[write_i] = L[read_i]
if L[read_i] not in ['a', 'c']:
write_i += 1
del L[write_i:]
答案 4 :(得分:2)
此代码自首次发布以来已经过编辑
我有时间问题,我可能会做错了。
import timeit
setup = """
import random
random.seed(1)
global b
setup_b = [(random.random(), random.random()) for i in xrange(1000)]
c = []
def tokeep(x):
return (x[1]>.45) and (x[1]<.5)
# define and call to turn into psyco bytecode (if using psyco)
b = setup_b[:]
def listcomp():
c[:] = [x for x in b if tokeep(x)]
listcomp()
b = setup_b[:]
def filt():
c = filter(tokeep, b)
filt()
b = setup_b[:]
def forfilt():
marked = (i for i, x in enumerate(b) if tokeep(x))
shift = 0
for n in marked:
del b[n - shift]
shift += 1
forfilt()
b = setup_b[:]
def forfiltCheating():
marked = (i for i, x in enumerate(b) if (x[1] > .45) and (x[1] < .5))
shift = 0
for n in marked:
del b[n - shift]
shift += 1
forfiltCheating()
"""
listcomp = """
b = setup_b[:]
listcomp()
"""
filt = """
b = setup_b[:]
filt()
"""
forfilt = """
b = setup_b[:]
forfilt()
"""
forfiltCheating = '''
b = setup_b[:]
forfiltCheating()
'''
psycosetup = '''
import psyco
psyco.full()
'''
print "list comp = ", timeit.timeit(listcomp, setup, number = 10000)
print "filtering = ", timeit.timeit(filt, setup, number = 10000)
print 'forfilter = ', timeit.timeit(forfilt, setup, number = 10000)
print 'forfiltCheating = ', timeit.timeit(forfiltCheating, setup, number = 10000)
print '\nnow with psyco \n'
print "list comp = ", timeit.timeit(listcomp, psycosetup + setup, number = 10000)
print "filtering = ", timeit.timeit(filt, psycosetup + setup, number = 10000)
print 'forfilter = ', timeit.timeit(forfilt, psycosetup + setup, number = 10000)
print 'forfiltCheating = ', timeit.timeit(forfiltCheating, psycosetup + setup, number = 10000)
以下是结果
list comp = 6.56407690048
filtering = 5.64738512039
forfilter = 7.31555104256
forfiltCheating = 4.8994679451
now with psyco
list comp = 8.0485959053
filtering = 7.79016900063
forfilter = 9.00477004051
forfiltCheating = 4.90830993652
我必须对psyco做错事,因为它实际上运行速度较慢。