Question

我有一个元组/列表的列表/元组（无论我使用哪一个），其中内部列表或元组的值具有可变大小。我需要检查变量是否在第一个插槽内部列表或元组中。

结构如下：

[[[in of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of of in ]，...重复约20x]

示例：

（[1,21,54,55,93,99,284,393,964,1029,1214,1216,1223,1253,1258,1334,1365,1394,1397,1453,1471,1543,1589 ，1824,1975,2054,2090,2164,2165,2166,2163,223,2547,2645,2802,2809,2931,2958,3031,3071,3077,3078,3189,3199,3202,3203]，[1 ，1,1,1,1,1,1,1,2,1,2,1,1,2,1,1,1,2,4,2,1,1,1,1,1,1 ，1,1,2,2,4,2,1,1,1,1,1,2,3,1,2,1,3,3,1,2]，[[3]，[1] ，[4]，[2]，[12]，[6]，[3]，[8]，[20,27]，[11]，[4,7]，[71]，[133]，[ [74,74]，[6]，[67]，[34]，[3,16]，[9,7,23,71]，[11,43]，[67]，[71]，[4] ，[139]，[16]，[52]，[4]，[31]，[7,50]，[2,12]，[1,1,81,114]，[13,70]，[ 60]，[121]，[30]，[16]，[214]，[29,78]，[9,37,60]，[14]，[216,249]，[28]，[2， 2,21]，[4,18,22]，[59]，[8,24]]）

这只是我的20k +元素列表中第一个相似的值。

所以我有一个功能来检查数字是否在：

[1,21,54,53,93,99,284,393,964,1029,1214,1216,1223,1253,1258,1334,1365,1394,1397,1453,1471,1543,1589， 1824,1975,2054,2090,2164,2165,2166,2163,2233,2547,2645,2802,2809,2931,2958,3031,3071,3077,3078,3189,3199,3202,3203]

它将返回索引。

我的功能： iD是我正在搜索的数字，发布只是我的嵌套循环中的第一个元素（上面的直接块是发布的一个例子）

def searchCurrentPosting(iD,posting):
x = 0

for each in posting[0]:
    if iD == each:
         return x
    x += 1
return False

每次给出一个新单词时，我必须运行这个搜索功能（20k到某个给定数字的幂）。此代码将运行大约一分钟。无论如何要缩短时间？

编辑：如果你想要我的整个程序，那就是：

这是我的主要推动因素：http://pastebin.com/Udjit7PP

它解析的文件是：CACM集合，它是IR测试的标准。

使用词根（端口词干）：http://pastebin.com/AzA0fvdV

是的，我正在创建倒排索引。

Answer 1

由于您在索引0处的列表已排序，您可以使用bisect模块在O(log N)时间内查找索引：

In [33]: import bisect

In [34]: lst = [1, 21, 54, 55, 93, 99, 284, 393, 964, 1029, 1214, 1216, 1223, 1253, 1258, 1334, 1365, 1394, 1397, 1453, 1471, 1543, 1589, 1824, 1975, 2054, 2090, 2164, 2165, 2166, 2167, 2323, 2547, 2645, 2802, 2809, 2931, 2958, 3031, 3071, 3077, 3078, 3189, 3199, 3202, 3203]

In [35]: n = 2802

In [36]: ind = bisect.bisect_left(lst, n)

In [37]: if lst[ind] == n:
    ...:     print 'Item found at {}'.format(ind)
    ...:     
Item found at 34

请注意，如果列表未排序，则最好先对其进行排序并将引用存储在变量中，这样您就不必反复对其进行排序。

另一种选择是使用字典，其中项目为键，索引为值（对于重复项目，仅存储其第一次出现的索引，即类似于list.index）。创建字典后，您可以在O(1)时间获得项目索引。

In [38]: dct = {}

In [39]: for i, x in enumerate(lst):
    ...:     if x not in dct:
    ...:         dct[x] = i
    ...:         

In [40]: dct.get(n)
Out[40]: 34

In [41]: dct.get(1000) #return None for non-existent items

时间比较：

In [43]: lst = list(range(10**5))

In [44]: %timeit bisect.bisect_left(lst, 10**5-5)
1000000 loops, best of 3: 444 ns per loop

In [45]: %timeit lst.index(10**5-5)
1000 loops, best of 3: 1.29 ms per loop

In [46]: %timeit dct.get(10**5-5) #dct created using the new list.
10000000 loops, best of 3: 104 ns per loop

如果您要继续更新索引0处的列表并且它没有排序，那么您应该只使用list.index()而不是使用循环，字典或平分。

In [47]: try:
    ...:     ind = lst.index(n)
    ...:     print 'Item found at {}'.format(ind)
    ...: except IndexError:
    ...:     pass
    ...: 
Item found at 34

Python中最快的搜索返回索引的元组/列表列表

1 个答案: