为什么DuplicateFilter不能与其他过滤器一起使用?例如,如果对测试DuplicateFilterTest稍微重新制作,那么过滤器未应用于其他过滤器并首先修剪结果的印象:
public void testKeepsLastFilter()
throws Throwable {
DuplicateFilter df = new DuplicateFilter(KEY_FIELD);
df.setKeepMode(DuplicateFilter.KM_USE_LAST_OCCURRENCE);
Query q = new ConstantScoreQuery(new ChainedFilter(new Filter[]{
new QueryWrapperFilter(tq),
// new QueryWrapperFilter(new TermQuery(new Term("text", "out"))), // works right, it is the last document.
new QueryWrapperFilter(new TermQuery(new Term("text", "now"))) // why it doesn't work? It is the third document, but hits count is 0.
}, ChainedFilter.AND));
// this varians doesn't hit too:
// ScoreDoc[] hits = searcher.search(new FilteredQuery(tq, df), new QueryWrapperFilter(new TermQuery(new Term("text", "now"))), 1000).scoreDocs;
// ScoreDoc[] hits = searcher.search(new FilteredQuery(tq, new QueryWrapperFilter(new TermQuery(new Term("text", "now")))), df, 1000).scoreDocs;
ScoreDoc[] hits = searcher.search(q, df, 1000).scoreDocs;
assertTrue("Filtered searching should have found some matches", hits.length > 0);
for (int i = 0; i < hits.length; i++) {
Document d = searcher.doc(hits[i].doc);
String url = d.get(KEY_FIELD);
TermDocs td = reader.termDocs(new Term(KEY_FIELD, url));
int lastDoc = 0;
while (td.next()) {
lastDoc = td.doc();
}
assertEquals("Duplicate urls should return last doc", lastDoc, hits[i].doc);
}
}
答案 0 :(得分:2)
DuplicateFilter 独立构造一个过滤器,该过滤器选择包含每个键的所有文档的第一个或最后一个出现。这可以通过最小的内存开销进行缓存。
您的第二个过滤器会独立选择其他一些文档。这两个选择可能不一致。要根据所有文档的某些任意子集过滤重复项,可能需要使用字段缓存才能实现性能,这是事情变得昂贵的RAM的方式