Question

假设一个带有文本信息的巨大文件 -

内容

"Hello, How are you?
This is Bob
The contents of the file needs to be searched
and I'm a very huge file"

搜索字符串

Bob

现在我需要搜索单词＆＃34; Bob＆＃34;在文件中并进行二分搜索。我该怎么做？

我尝试使用UNIX SORT对文件进行排序，并得到以下输出 -

and I'm a very huge file
How are you?
The contents of the file needs to be searched
This is Bob

它对文件进行排序，但单词＆＃34; Bob＆＃34;在最后一行。

问题是搜索是＆＃34;我没有搜索整行＆＃34;而是文件中的单个单词..

最有效的方法是什么？

Answer 1

最有效的方法是创建一个生成器，生成单个单词，然后将它们与您要查找的单词进行比较。

def get_next_word():
    with open("Input.txt") as in_file:
        for line in in_file:
            for word in line.strip().split():
                yield word

print any(word == "Bob" for word in get_next_word())
# True

我们使用any函数，当它找到匹配时会短路。因此，我们不必处理整个文件。

修改

如果您要多次搜索，最好的方法是将单词列表转换为集合，然后使用in运算符检查单词是否存在。

words_set = set(get_next_word()) print "Bob" in words_set # True print "the" in words_set # True print "thefourtheye" in words_set # False

对一个巨大的文本文件进行排序并进行二进制搜索

1 个答案: