在python中搜索嵌套列表的最有效方法是什么?

时间:2012-08-15 03:22:08

标签: python

我有一个包含嵌套列表的列表,我需要知道在这些嵌套列表中搜索的最有效方法。

例如,如果我有

[['a','b','c'],
['d','e','f']]

我必须搜索上面的整个列表,找到'd'的最有效方法是什么?

5 个答案:

答案 0 :(得分:11)

>>> lis=[['a','b','c'],['d','e','f']]
>>> any('d' in x for x in lis)
True
使用any

生成器表达式

$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "any('d' in x for x in lis)" 
1000000 loops, best of 3: 1.32 usec per loop

生成器表达

$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in (y for x in lis for y in x)"
100000 loops, best of 3: 1.56 usec per loop

列表理解

$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in [y for x in lis for y in x]"
100000 loops, best of 3: 3.23 usec per loop

如果物品接近尾声,或者根本不存在,怎么样? any比列表理解

更快
$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]"
    "'NOT THERE' in [y for x in lis for y in x]"
100000 loops, best of 3: 4.4 usec per loop

$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" 
    "any('NOT THERE' in x for x in lis)"
100000 loops, best of 3: 3.06 usec per loop

也许如果列表长1000倍? any仍然更快

$ python -m timeit -s "lis=1000*[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]"
    "'NOT THERE' in [y for x in lis for y in x]"
100 loops, best of 3: 3.74 msec per loop
$ python -m timeit -s "lis=1000*[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" 
    "any('NOT THERE' in x for x in lis)"
100 loops, best of 3: 2.48 msec per loop

我们知道发电机需要一段时间才能设置,因此LC获胜的最佳机会是一个非常短的列表

$ python -m timeit -s "lis=[['a','b','c']]"
    "any('c' in x for x in lis)"
1000000 loops, best of 3: 1.12 usec per loop
$ python -m timeit -s "lis=[['a','b','c']]"
    "'c' in [y for x in lis for y in x]"
1000000 loops, best of 3: 0.611 usec per loop

any也使用更少的内存

答案 1 :(得分:5)

使用list comprehension,给定:

mylist = [['a','b','c'],['d','e','f']]
'd' in [j for i in mylist for j in i]

的产率:

True

这也可以用生成器完成(如@AshwiniChaudhary所示)

根据以下评论进行更新:

这是相同的列表理解,但使用更多描述性变量名称:

'd' in [elem for sublist in mylist for elem in sublist]

列表推导部分中的循环结构等同于

for sublist in mylist:
   for elem in sublist

并生成一个列表,其中“d”可以使用in运算符进行测试。

答案 2 :(得分:4)

使用生成器表达式,这里不会遍历整个列表,因为生成器逐个生成结果:

>>> lis = [['a','b','c'],['d','e','f']]
>>> 'd' in (y for x in lis for y in x)
True
>>> gen = (y for x in lis for y in x)
>>> 'd' in gen
True
>>> list(gen)
['e', 'f']

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in (y for x in lis for y in x)"
    100000 loops, best of 3: 2.96 usec per loop

~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in [y for x in lis for y in x]"
    100000 loops, best of 3: 7.4 usec per loop

答案 3 :(得分:2)

如果您的数组总是在显示时排序,那么a[i][j] <= a[i][j+1]a[i][-1] <= a[i+1][0](一个数组的最后一个元素总是小于或等于下一个数组中的第一个元素),那么你可以通过以下方式消除大量的比较:

a = # your big array

previous = None
for subarray in a:
   # In this case, since the subarrays are sorted, we know it's not in
   # the current subarray, and must be in the previous one
   if a[0] > theValue:
      break
   # Otherwise, we keep track of the last array we looked at
   else:
      previous = subarray

return (theValue in previous) if previous else False

如果你有很多数组并且它们都有很多元素,那么这种优化是值得的。

答案 4 :(得分:0)

如果您只想知道列表中是否存在您的元素 那么您可以通过将列表转换为字符串并检查它来实现。您可以扩展此嵌套列表。像 [[1],'a','b','d',['a','b',['c',1]]] 如果您不知道嵌套列表的级别,并且想知道是否存在可搜索项,则此方法很有用。

    search='d'
    lis = [['a',['b'],'c'],[['d'],'e','f']]
    print(search in str(lis))