这个问题是关于python中的Whoosh包。
飞快移动:Link
链接解析用户查询Whoosh: link
目前我有以下问题:
Whoosh搜索者非常善于搜索文档。但我的亮点功能有问题。在下面的脚本中,我正在搜索' " anim id"或志愿者#39;这意味着找到字符串" anim id"或刺痛" coluptate"。
然而,当我在文档上应用高亮度功能时,它还会突出显示单个单词" anim"。这是我不想要的。我只需要遵循QueryParser规则的重点。 ('" anim id" OR voluptate')
有谁知道怎么做?
from whoosh.index import create_in
from whoosh.qparser import QueryParser
from whoosh.fields import *
schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT(stored=True))
ix = create_in("index", schema)
writer = ix.writer()
writer.add_document(title=u"First document", path=u"/a",
content=u"TLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et anim dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.")
writer.commit()
with ix.searcher() as searcher:
query = QueryParser("content", ix.schema).parse('"anim id" OR voluptate')
results = searcher.search(query)
for hit in results:
highlights = hit.highlights("content").split("...")
for highlight in highlights:
print highlight
输出:
ut labore et <b class="match term0">anim</b> dolore magna aliqua
in reprehenderit in <b class="match term1">voluptate</b> velit esse cillum
deserunt mollit <b class="match term0">anim</b> <b class="match term2">id</b> est laborum
但我需要输出:
in reprehenderit in <b class="match term1">voluptate</b> velit esse cillum
deserunt mollit <b class="match term0">anim</b> <b class="match term2">id</b> est laborum
该查询还能够使用布尔运算符:OR,AND,NOT