为什么我的二进制搜索实现效率很低?

时间:2014-06-22 14:20:31

标签: python recursion binary-search

我正在进行Python练习,以搜索来自给定排序word的{​​{1}},其中包含超过100,000个单词。

当使用Python bisect module中的wordlist时,它非常有效,但使用我自己创建的二进制方法效率非常低。有人可以澄清一下原因吗?

这是使用Python bisect_left模块的搜索方法:

bisect

我的实现效率非常低(不知道为什么):

def in_bisect(word_list, word):
    """Checks whether a word is in a list using bisection search.

    Precondition: the words in the list are sorted

    word_list: list of strings
    word: string
    """
    i = bisect_left(word_list, word)
    if i != len(word_list) and word_list[i] == word:
        return True
    else:
        return False

2 个答案:

答案 0 :(得分:2)

if word in wordlist[len(wordlist)/2:] 

将使Python搜索wordlist的一半,这有点扼杀了编写二进制搜索的目的。此外,您没有正确地将列表分成两半。二元搜索的策略是将搜索空间分成两半,然后仅将相同的策略应用于word 可以所在的一半。为了知道哪一半是正确的搜索,wordlist排序至关重要。这是一个示例实现,用于跟踪验证word是否在wordlist中所需的调用次数。

import random

numcalls = 0
def bs(wordlist, word):
    # increment numcalls
    print('wordlist',wordlist)
    global numcalls
    numcalls += 1

    # base cases
    if not wordlist:
        return False
    length = len(wordlist)
    if length == 1:
        return wordlist[0] == word

    # split the list in half
    mid = int(length/2) # mid index
    leftlist = wordlist[:mid]
    rightlist = wordlist[mid:]
    print('leftlist',leftlist)
    print('rightlist',rightlist)
    print()

    # recursion
    if word < rightlist[0]:
        return bs(leftlist, word) # word can only be in left list
    return bs(rightlist, word) # word can only be in right list

alphabet = 'abcdefghijklmnopqrstuvwxyz'
wl = sorted(random.sample(alphabet, 10))
print(bs(wl, 'm'))
print(numcalls)

我添加了一些print语句,因此您可以看到正在发生的事情。这是两个示例输出。首先:word位于wordlist

wordlist ['b', 'c', 'g', 'i', 'l', 'm', 'n', 'r', 's', 'v']
leftlist ['b', 'c', 'g', 'i', 'l']
rightlist ['m', 'n', 'r', 's', 'v']

wordlist ['m', 'n', 'r', 's', 'v']
leftlist ['m', 'n']
rightlist ['r', 's', 'v']

wordlist ['m', 'n']
leftlist ['m']
rightlist ['n']

wordlist ['m']
True
4

第二:word不在wordlist

wordlist ['a', 'c', 'd', 'e', 'g', 'l', 'o', 'q', 't', 'x']
leftlist ['a', 'c', 'd', 'e', 'g']
rightlist ['l', 'o', 'q', 't', 'x']

wordlist ['l', 'o', 'q', 't', 'x']
leftlist ['l', 'o']
rightlist ['q', 't', 'x']

wordlist ['l', 'o']
leftlist ['l']
rightlist ['o']

wordlist ['l']
False
4

请注意,如果您将单词表的大小加倍,即使用

wl = sorted(random.sample(alphabet, 20))

numcalls平均只会比长度为10的wordlist高一个,因为wordlist必须再分成两半。

答案 1 :(得分:0)

简单地搜索一个单词是否在单词列表中(python 2.7):

def bisect_fun(listfromfile, wordtosearch):
    bi = bisect.bisect_left(listfromfile, wordtosearch)
    if listfromfile[bi] == wordtosearch:
        return listfromfile[bi], bi