Question

根据Whoosh文档（以及this上一个关于SO的问题），可以在Whoosh中搜索exact phrase，在该词组周围加上双引号搜索。但是，当我尝试实现精确的短语搜索时，我会回到看似默认搜索语法生成的结果。有谁知道我如何改变我的搜索语法，以便只匹配查询文档的那些部分（Project Gutenberg＆Gulliver＆＃39; s Travels），其中包含确切的短语＆＃34;理性政府＆＃34; ？对于其他人可以提供的任何指示，我将不胜感激。

from whoosh.index import create_in
from whoosh.fields import *
from whoosh import qparser
import os, codecs, nltk

def remove_non_ascii(s):
    return "".join(x for x in s if ord(x) < 128)

if not os.path.exists("indexdir"):
    os.mkdir("indexdir")

schema = Schema(content=TEXT(stored=True, analyzer=analysis.StandardAnalyzer(stoplist=None)))

ix = create_in("indexdir", schema)
writer = ix.writer()
gulliver = codecs.open("gulliver.txt","r","utf-8")
gulliver = gulliver.read().replace("_","")
writer.add_document(content=gulliver)
writer.commit()

searcher = ix.searcher()

parser = qparser.QueryParser("content", schema=ix.schema)
q = parser.parse(u"government of reason")
results = searcher.search(q)
results.fragmenter.charlimit = None

for hit in results:
    print " ".join( remove_non_ascii( nltk.clean_html( hit.highlights("content", top=1000000) ) ).split() )

修改

Matt Chaput提供了一些代码，这些代码应该在this短篇帖子中返回给定查询的热门精彩集中的精确短语，但我无法让他的方法发挥作用。

Python：在Whoosh中突出显示精确的短语搜索结果

修改

0 个答案: