我使用“块”填充了带有父文档和子文档的索引。即使用IndexWriter.addAll()方法添加文档,最后一个文档为父文档。
此刻,我仅成功搜索了“块”,其中查询中的任何词出现在父项或子项中。这给了我偏斜的结果。例如我得到了最好的结果,其中只有一个术语在“块”中多次出现,而其他术语根本没有出现。
我想搜索“块”,其中查询中的所有所有词都必须出现在父项或子项中。
但是我不确定如何构造查询。
我当前的查询代码如下:
Analyzer analyzer = new EnglishAnalyzer();
//Note, both parent and child docs have a 'textContent' field
QueryParser queryParser = new QueryParser("textContent", analyzer);
Directory index = FSDirectory.open(Paths.get("${indexParentDir}/${name}.lucene"));
BitSetProducer parentsFilter = new QueryBitSetProducer(new TermQuery(new Term("child", "N")));
Query textQuery = queryParser.parse("foo bar");
//Construct child query
BooleanQuery.Builder childQueryBuilder = new BooleanQuery.Builder();
childQueryBuilder.add(new BooleanClause(textQuery, BooleanClause.Occur.MUST));
childQueryBuilder.add(new BooleanClause(new TermQuery(new Term("child", "Y")), BooleanClause.Occur.MUST));
Query childQuery = new ToParentBlockJoinQuery(childQueryBuilder.build(), parentsFilter, ScoreMode.Avg);
//Construct parent query
BooleanQuery.Builder parentQueryBuilder = new BooleanQuery.Builder();
parentQueryBuilder.add(new BooleanClause(textQuery, BooleanClause.Occur.MUST));
parentQueryBuilder.add(new BooleanClause(new TermQuery(new Term("child", "N")), BooleanClause.Occur.MUST));
//Construct join of child and parent query
BooleanQuery.Builder childAndParentQueryBuilder = new BooleanQuery.Builder();
childAndParentQueryBuilder.add(new BooleanClause(childQuery, BooleanClause.Occur.SHOULD));
childAndParentQueryBuilder.add(new BooleanClause(parentQueryBuilder.build(), BooleanClause.Occur.SHOULD));
Query childAndParentQuery = childAndParentQueryBuilder.build();
//Run the query
DirectoryReader reader = DirectoryReader.open(index);
CheckJoinIndex.check(reader, parentsFilter);
IndexSearcher searcher = new IndexSearcher(reader);
searcher.search(childAndParentQuery, 10);
上面的代码将返回最佳结果,因此其中一个术语会多次出现。例如如果“ foo”在父文档或子文档中出现100次。但是“ bar”根本没有出现。
我只想返回所有所有术语(例如'foo'和'bar')出现在父项或其子项中的结果。
一种选择是在“父文档”中创建一个字段,该字段是父文档和子文档中所有textContent字段的集合,并且仅在新的聚合字段中进行搜索。但是这些索引已经很大。 (例如50GB)。而且我仍然需要出于显示目的将textContent在父级和子级中分开,因此创建一个聚合字段几乎会使索引大小增加一倍。
任何帮助将不胜感激。
答案 0 :(得分:0)
我通过使用DisjunctionMaxQuery而不是BooleanQuery来将父查询和子查询连接在一起解决了这个问题。
从文档中:
...我们希望主要分数是与最高分数相关的分数 提高,而不是字段得分的总和(如BooleanQuery所给出的)。 如果查询为“白化大象”,则可确保“白化”匹配 一个字段和匹配另一个字段的“大象”比 匹配两个字段的“白化” ...
更新的代码:
Analyzer analyzer = new EnglishAnalyzer();
//Note, both parent and child docs have a 'textContent' field
QueryParser queryParser = new QueryParser("textContent", analyzer);
Directory index = FSDirectory.open(Paths.get("${indexParentDir}/${name}.lucene"));
BitSetProducer parentsFilter = new QueryBitSetProducer(new TermQuery(new Term("child", "N")));
Query textQuery = queryParser.parse("foo bar");
//Construct child query
BooleanQuery.Builder childQueryBuilder = new BooleanQuery.Builder();
childQueryBuilder.add(new BooleanClause(textQuery, BooleanClause.Occur.MUST));
childQueryBuilder.add(new BooleanClause(new TermQuery(new Term("child", "Y")), BooleanClause.Occur.MUST));
Query childQuery = new ToParentBlockJoinQuery(childQueryBuilder.build(), parentsFilter, ScoreMode.Avg);
//Construct parent query
BooleanQuery.Builder parentQueryBuilder = new BooleanQuery.Builder();
parentQueryBuilder.add(new BooleanClause(textQuery, BooleanClause.Occur.MUST));
parentQueryBuilder.add(new BooleanClause(new TermQuery(new Term("child", "N")), BooleanClause.Occur.MUST));
Query parentQuery = parentQueryBuilder.build();
//Construct join of child and parent query
Query childAndParentQuery = new DisjunctionMaxQuery(Arrays.asList(childQuery, parentQuery), 0.5f);
//Run the query
DirectoryReader reader = DirectoryReader.open(index);
CheckJoinIndex.check(reader, parentsFilter);
IndexSearcher searcher = new IndexSearcher(reader);
searcher.search(childAndParentQuery, 10);