Question

此代码直接来自Whoosh的quickstart docs：

import os.path
from whoosh.index import create_in
from whoosh.fields import Schema, STORED, ID, KEYWORD, TEXT
from whoosh.index import open_dir
from whoosh.query import *
from whoosh.qparser import QueryParser

#establish schema to be used in the index
schema = Schema(title=TEXT(stored=True), content=TEXT,
                path=ID(stored=True), tags=KEYWORD, icon=STORED)

#create index directory
if not os.path.exists("index"):
    os.mkdir("index")

#create the index using the schema specified above
ix = create_in("index", schema)

#instantiate the writer object
writer = ix.writer()

#add the docs to the index
writer.add_document(title=u"My document", content=u"This is my document!",
                    path=u"/a", tags=u"first short", icon=u"/icons/star.png")
writer.add_document(title=u"Second try", content=u"This is the second example.",
                    path=u"/b", tags=u"second short", icon=u"/icons/sheep.png")
writer.add_document(title=u"Third time's the charm", content=u"Examples are many.",
                    path=u"/c", tags=u"short", icon=u"/icons/book.png")

#commit those changes
writer.commit()

#identify searcher
with ix.searcher() as searcher:

    #specify parser
    parser = QueryParser("content", ix.schema)

    #specify query -- try also "second"
    myquery = parser.parse("is")

    #search for results
    results = searcher.search(myquery)

    #identify the number of matching documents
    print len(results)

我只是将一个值 - 即动词“is” - 传递给parser.parse（）调用。然而，当我运行它时，我得到长度为零的结果，而不是长度为2的预期结果。如果我将“is”替换为“second”，我会得到一个结果，如预期的那样。为什么使用“is”的搜索不会产生匹配？

修改

正如@Philippe指出的那样，默认的Whoosh索引器会删除停用词，因此会出现上述行为。如果要保留停用词，可以在索引索引指定字段时指定要使用的分析器，并且可以向分析器传递一个参数，以避免删除停用词; e.g：

schema = Schema(title=TEXT(stored=True, analyzer=analysis.StandardAnalyzer(stoplist=None)))

Answer 1

默认文本分析器应用停用词过滤器： https://bitbucket.org/mchaput/whoosh/src/999cd5fb0d110ca955fab8377d358e98ba426527/src/whoosh/analysis/filters.py?at=default#cl-41

另见doc： http://whoosh.readthedocs.org/en/latest/api/analysis.html#whoosh.analysis.StopFilter

Python：飞快移动似乎返回不正确的结果

修改

1 个答案: