Lucene 5中奇怪的过滤行为

时间:2016-02-10 16:29:26

标签: lucene

Lucene 5中,不推荐Filter,而赞成ConstantQuery包装普通查询对象。我遇到了一个"翻译"来自旧过滤器对象的查询对象无法正常工作。

val directory = new RAMDirectory()
val config = new IndexWriterConfig(new KeywordAnalyzer())
val writer = new IndexWriter(directory, config)
writer.addDocument({
  val document = new Document()
  document.add(new StringField("k", "v1", Field.Store.YES))
  document.add(new StringField("k", "v2", Field.Store.YES))
  document
})
writer.addDocument({
  val document = new Document()
  document.add(new StringField("k", "v1", Field.Store.YES))
  document.add(new StringField("k", "v3", Field.Store.YES))
  document
})
writer.commit()

val reader = DirectoryReader.open(directory)
val searcher = new IndexSearcher(reader)

val filter =
  new BooleanQuery.Builder().add(
    new BooleanQuery.Builder()
      .add(new ConstantScoreQuery( new TermQuery( new Term("k", "v1") ) ), BooleanClause.Occur.MUST)
      .add(new ConstantScoreQuery( new TermQuery( new Term("k", "v2") ) ), BooleanClause.Occur.MUST_NOT)
      .build()
    ,
    BooleanClause.Occur.MUST_NOT
  ).build()

Console.println("filter: " + filter)
val results = searcher.search(filter, Int.MaxValue)
Console.println("# results: " + results.totalHits)

val filter2 = new BooleanFilter()

filter2.
  add({
    val inner = new BooleanFilter()
    inner add(new TermFilter(new Term("k", "v1")), BooleanClause.Occur.MUST)
    inner add(new TermFilter(new Term("k", "v2")), BooleanClause.Occur.MUST_NOT)
    inner
  }, BooleanClause.Occur.MUST_NOT)

Console.println("filter2: " + filter2)
val results2 = searcher.search(new MatchAllDocsQuery(), filter2, Int.MaxValue)
Console.println("# results2: " + results2.totalHits

控制台中的输出是,

filter: -(+ConstantScore(k:v1) -ConstantScore(k:v2))
# results: 0
filter2: BooleanFilter(-BooleanFilter(+k:v1 -k:v2))
# results2: 1

从我的角度来看,我认为filterfilter2Lucene 5中的工作方式应该相同,但显然结果却不然。我做错了什么?

1 个答案:

答案 0 :(得分:0)

答案似乎来自SO Post,

Weird Solr/Lucene behaviors with boolean operators

引用如下,

布尔查询必须至少有一个"肯定的"表达式(即;必须或应该)以便匹配。 Solr试图帮助解决这个问题,如果要求执行一个只包含最高级别的否定子句的BooleanQuery,它会添加一个匹配所有文档查询(即:*:*)

如果顶级BoolenQuery包含一个嵌套的BooleanQuery,其中只包含否定子句,则嵌套查询将不会被修改,并且(根据定义)它匹配任何文档 - 如果需要,则表示外部查询将不匹配。

所以简而言之,我想我必须在MatchAllDocsQuery添加BooleanQuery.Builder,以便至少有一个MUSTSHOULD条款来制作查询实际匹配的东西(否则总是没有)。 filter修改如下诀窍。

val filter =
  new BooleanQuery.Builder().add(
    new BooleanQuery.Builder()
      .add(new ConstantScoreQuery( new TermQuery( new Term("k", "v1") ) ), BooleanClause.Occur.MUST)
      .add(new ConstantScoreQuery( new TermQuery( new Term("k", "v2") ) ), BooleanClause.Occur.MUST_NOT)
      .build()
    ,
    BooleanClause.Occur.MUST_NOT
  ).add(new MatchAllDocsQuery(), BooleanClause.Occur.SHOULD).build()