Question

我想要一个程序 - 搜索（文件，列表），搜索说USB U盘中包含一个或多个单词的文本，如果它包含一个单词，它会把它放在一个列表中并继续下一个字。对于每个文档，它找到单词，我希望它打印一个语句，在“此文件目录”中显示“word [0]，word [1]，word [2]”这是我到目前为止所尝试的：

import os

def search(file, list):
    if list == []:
        return
    else:
        if os.path.isfile(file):
            try:
                infile = open(file, 'r')
                doc = infile.read()
            except:
                return
            infile.close()
            print ('Searching {}'.format(file))
            if list[0] in doc:
                print('{} in {}'.format(list[0], file))
        elif os.path.isdir(file):
            for item in os.listdir(file):
                itempath = os.path.join(file, item)
                search(itempath, list)
    return search(file, list[1:])

Answer 1

你没有迭代你的list（顺便说一句。不要使用file和list作为变量名称，你正在隐藏内置类型）来检查条款，你必须做类似的事情：

found_words = []
for word in list:
    if word in doc:
        found_words.append(word)
if found_words:
    print('{} in {}'.format(", ".join(found_words), file))

相反，如果你想检查所有条款。但是，你要做的远比它需要的复杂得多。对于初学者，您应该使用os.walk()递归遍历所有子目录。其次，在内存中读取整个文件并不是一个好主意 - 不仅搜索平均速度较慢，而且遇到大文件的那一刻你可能会开始遇到内存问题......

我会这样做：

def search(path, terms):
    result = {}  # store our result in the form "file_path": [found terms]
    start_path = os.path.abspath(os.path.realpath(path))  # full path, resolving a symlink
    for root, dirs, files in os.walk(start_path):  # recurse our selected dir
        for source in files:  # loop through each files
            source_path = os.path.join(root, source)  # full path to our file
            try:
                with open(source_path, "r") as f:  # open our current file
                    found_terms = []  # store for our potentially found terms
                    for line in f:  # loop through it line-by line
                        for term in terms:  # go through all our terms and check for a match
                            if term in line:  # if the current term exists on the line
                                found_terms.append(term)  # add the found term to our store
                    if found_terms:  # if we found any of the terms...
                        result[source_path] = found_terms  # store it in our result
            except IOError:
                pass  # ignore I/O errors, we may optionally store list of failed files...
    return result

它将返回一个字典，其键设置为您的文件路径，值是发现的术语。因此，例如，如果您要搜索当前文件夹（脚本的运行文件夹）中的单词“import”，您可以使用以下命令执行此操作：

search_results = search("./", ["import, export"])
for key in search_results:
    print("{} in {}".format(", ".join(search_results[key]), key)

它应该打印出你想要的结果。它还可以使用检查文件扩展名/类型，这样您就不会浪费时间尝试浏览不可读/二进制文件。此外，编解码器检查应按顺序进行，因为根据您的文件，读取其行可能会引发unicode错误（默认为解码）。最重要的是，还有很大的改进空间......

另外，请注意，您并不是在寻找一个单词，而只是存在传递的字符序列。例如，如果您要搜索cat，它也会返回包含caterpillar的文件。而且，还有专门的工具可以在很短的时间内完成这项工作。

在目录中搜索包含列表中的一个或多个单词的文件

1 个答案: