Question

我想在文档中找到一个词组，我已经在快速入门中使用了这些代码。

>>> from whoosh.index import create_in
>>> from whoosh.fields import *
>>> schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT)
>>> ix = create_in("indexdir", schema)
>>> writer = ix.writer()
>>> writer.add_document(title=u"First document", path=u"/a", content=u"This is the first document we've added!")
>>> writer.add_document(title=u"Second document", path=u"/b",  content=u"The second one is even more interesting!")
>>> writer.commit()
>>> from whoosh.qparser import QueryParser
>>> with ix.searcher() as searcher:
        query = QueryParser("content", ix.schema).parse("first")
        results = searcher.search(query)
        results[0]

    result: {"title": u"First document", "path": u"/a"}

但后来我发现他们会将关键词分成几个单词，然后搜索文档。如果我想在文档＆＃34;中搜索像＃34;这里的第一个人，我该怎么做。

在文件上，它说，使用

＆＃34;这是一个短语＆＃34;

如果我想搜索：

这是一个短语。

这让我感到困惑。

此外，这是一堂课，似乎可以帮助我，但我不知道如何使用它。

class whoosh.query.Phrase(fieldname, words, slop=1, boost=1.0, char_ranges=None)
 Matches documents containing a given phrase.

更新：我以这种方式使用它，但没有匹配。

from whoosh.index import create_in
from whoosh.fields import *
schema = Schema(title=TEXT(stored=True), path=ID(stored=True),   content=TEXT)
ix = create_in("indexdir", schema)
writer = ix.writer()
writer.add_document(title=u"First document", path=u"/a",
                 content=u"This is the first document we've added!")
writer.add_document(title=u"Second document", path=u"/b",
               content=u"The second one is even more interesting!")
writer.commit()
from whoosh.query import Phrase

a = Phrase("content", u"the first")

results = ix.searcher().search(a)
print results

结果：

短语的前0个结果（＆＃39;内容＆＃39;，u＆＃39;第一个＆＃39;，slop = 1， boost = 1.000000）runtime = 0.0＆gt;

根据其他人

更新

with ix.searcher() as searcher:
    query = QueryParser("content", ix.schema).parse(**'"first x document"'**)
results = searcher.search(query)
print results[0]

结果：点击{＆＃39;内容＆＃39;：你＆＃34;这是我们添加的第一个文档！＆＃34;，＆＃39;路径＆＃39;：你＆＃39; / a＆＃39;，＆＃39;标题＆＃39;：你＆＃39;第一份文件＆＃39;}＆gt;

我认为应该没有匹配的结果，因为没有＆＃34;第一个x文件＆＃34;在文件中。否则，它不是完全匹配。

Answer 1

您应该Phrase list个字词而不是字符串作为第二个参数，并且还要删除，因为它是一个停用词：

a = Phrase("content", [u"first",u"document"])

而不是

a = Phrase("content", u"the first")

阅读文档：

class whoosh.query.Phrase(fieldname, words, slop=1, boost=1.0, char_ranges=None)
Matches documents containing a given phrase.
参数：

fieldname - 要搜索的字段。

words - 短语中的单词列表（unicode字符串）。

通过在" "中使用引号 QueryParser来自然使用短语搜索：

>>> with ix.searcher() as searcher:
        query = QueryParser("content", ix.schema).parse('"first document"')
        results = searcher.search(query)
        results[0]

更新：对于"first x document"匹配的内容，这是因为x并且所有单字符字都是停用字并被过滤。

Answer 2

要在内容中查找短语，请在定义架构时使用phrase=True，如下所示

schema = Schema(title=TEXT(stored=True), content=TEXT(phrase=True))

然后在单个引号中使用双引号，如下所示

query = QueryParser("content", schema=ix.schema).parse('"exact phrase"')

嗖的一声与短语完全匹配

2 个答案: