我需要从lucene搜索索引中删除文档。标准方法:
indexReader.deleteDocuments(new Term("field_name", "field value"));
不会这样做:我需要根据多个字段执行删除。我需要这样的东西:
(pseudo code)
TermAggregator terms = new TermAggregator();
terms.add(new Term("field_name1", "field value 1"));
terms.add(new Term("field_name2", "field value 2"));
indexReader.deleteDocuments(terms.toTerm());
是否有任何构造?
答案 0 :(得分:2)
IndexWriter
包含允许更强大删除功能的方法,例如IndexWriter.deleteDocuments(Query)
。您可以使用要删除的术语的组合构建一个BooleanQuery,并使用它。
答案 1 :(得分:0)
首先,注意您使用的是哪种分析仪。我被困了一段时间才意识到StandardAnalyzer过滤掉常见词语,例如''并且' a'。当您的字段值为' A'时,这是一个问题。您可能需要考虑KeywordAnalyzer:
See this post around the analyzer.
// Create an analyzer:
// NOTE: We want the keyword analyzer so that it doesn't strip or alter any terms:
// In our example, the Standard Analyzer removes the term 'A' because it is a common English word.
// https://stackoverflow.com/a/9071806/231860
KeywordAnalyzer analyzer = new KeywordAnalyzer();
接下来,您可以使用QueryParser创建查询:
See this post around overriding the default operator.
// Create a query parser without a default field in this example (the first argument):
QueryParser queryParser = new QueryParser("", analyzer);
// Optionally, set the default operator to be AND (we leave it the default OR):
// https://stackoverflow.com/a/9084178/231860
// queryParser.setDefaultOperator(QueryParser.Operator.AND);
// Parse the query:
Query multiTermQuery = queryParser.parse("field_name1:\"field value 1\" AND field_name2:\"field value 2\"");
或者您可以通过自己使用API构建查询来实现相同目的:
See this tutorial around creating the BooleanQuery.
BooleanQuery multiTermQuery = new BooleanQuery();
multiTermQuery.add(new TermQuery(new Term("field_name1", "field value 1")), BooleanClause.Occur.MUST);
multiTermQuery.add(new TermQuery(new Term("field_name2", "field value 2")), BooleanClause.Occur.MUST);
当关键字段是数字时,您不能使用TermQuery,而必须使用NumericRangeQuery。
See the answer to this question.
// NOTE: For IntFields, we need NumericRangeQueries:
// https://stackoverflow.com/a/14076439/231860
BooleanQuery multiTermQuery = new BooleanQuery();
multiTermQuery.add(NumericRangeQuery.newIntRange("field_name1", 1, 1, true, true), BooleanClause.Occur.MUST);
multiTermQuery.add(NumericRangeQuery.newIntRange("field_name2", 2, 2, true, true), BooleanClause.Occur.MUST);
然后我们最终将查询传递给编写器以删除与查询匹配的文档:
See the answer to this question.
// Remove the document by using a multi key query:
// http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html
writer.deleteDocuments(multiTermQuery);