Question

我正在存储一个包含URL字段的文档：

Document doc = new Document();
doc.add(new Field("url", url, Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("text", text, Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("html", CompressionTools.compressString(html), Field.Store.YES));

我希望能够通过其网址找到文档，但我得到0结果：

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30)
Query query = new QueryParser(LUCENE_VERSION, "url", analyzer).parse(url);
IndexSearcher searcher = new IndexSearcher(index, true);
TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);
searcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
// Display results
for (ScoreDoc hit : hits) {
  System.out.println("FOUND A MATCH");
}
searcher.close();

我可以做哪些不同的操作，以便我可以存储HTML文档并通过其URL找到它？

Answer 1

您可以将查询重写为此类

Query query = new QueryParser(LUCENE_VERSION, "url", analyzer).newTermQuery(new Term("url", url)).parse(url);

<强>建议：

我建议您使用BooleanQuery，因为它提供了良好的性能，并在内部进行了优化。

TermQuery tq= new TermQuery(new Term("url", url));
// BooleanClauses Enum SHOULD says Use this operator for clauses that should appear in the matching documents.
BooleanQuery bq = new BooleanQuery().add(tq,BooleanClause.Occur.SHOULD);
IndexSearcher searcher = new IndexSearcher(index, true);
TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);
searcher.search(query, collector);

我看到你使用URL frield作为Not_Analysed进行索引，这是一个很好的搜索IMO，因为没有使用分析器，所以该值将被存储为单个术语。

现在，如果你的商业案例说明，我会给你一个URL，找到Lucene索引中的 EXACT ，然后用不同的分析器（KeywordAnalyzer等）查看你的索引

Answer 2

Lucene QueryParser正在解释部分url字符作为Query Parser Syntax的一部分。您可以改为使用TermQuery，如下所示：

TermQuery query = new TermQuery(new Term("url", url));

Lucene通过URL搜索

2 个答案: