Question

我有一组文件，如：

doc1: "world is great. hello world"
doc2: "lucene is great. hello world"
doc3: "worldwide population"
doc4: "nothing important"

我需要选择选择[doc1, doc3]的查询，因为它包含world但不选择[doc2]，因为它包含hello world。

换句话说，我需要：“选择包含单词'world'的所有文档，除非该单词是'hello world'的一部分。”在文档中world的次数必须多于hello world的次数。

在lucene查询中是否可以这样，或者我是否需要预处理文档并将所有hello world替换为不包含world的内容？

如果有negative boost那就太棒了，所以我会做world AND "hello world"^-1

Answer 1

这个术语hello world是否保持不变？

如果是，我们可以在查询中添加 fq = NOT fieldname：“hello world”吗？