如何使用whoosh和python在整个驱动器上搜索字符串?

时间:2019-04-12 11:19:50

标签: python whoosh

我想以最快的速度在网络驱动器上搜索字符串或字符串的一部分 尽可能。

我在玩飞快移动,但没有得到正确的结果。我只得到 权限错误。我希望,如果无法读取该文件,则该程序将继续进行直到搜索到整个驱动器。

import os
from whoosh.index import create_in
from whoosh.fields import Schema, TEXT, ID


def createSearchableData(root):

    '''
    Schema definition: title(name of file), path(as ID), content(indexed
    but not stored),textdata (stored text content)
    '''
    schema = Schema(title=TEXT(stored=True),path=ID(stored=True),\
              content=TEXT,textdata=TEXT(stored=True))
    if not os.path.exists("indexdir"):
        os.mkdir("indexdir")

    # Creating a index writer to add document as per schema
    ix = create_in("indexdir",schema)
    writer = ix.writer()


    filepaths = [os.path.join(root,i) for i in os.listdir(root)]
    for path in filepaths:
        try:
            fp = open(path,'r')
            text = fp.read()
            if "L" in text:
                print("Found the search data in:",path)
            writer.add_document(title=path.split("\\")[1], path=path,\
            content=text,textdata=text)
            fp.close()
            writer.commit()
        except Exception as e:
            print("ERROR",e,path)

root = "Y:\Products ABC"
createSearchableData(root)

0 个答案:

没有答案