我知道已经提出了类似的问题,但我找不到任何适合我所寻找的答案。
基本上,我想搜索短语,只返回具有完整短语 而非部分匹配的匹配。
e.g。如果我搜索“这是”,则文档中的“这是一个短语”应不返回匹配。
举个例子:Exact Phrase search using Lucene?
“foo bar”不应该返回匹配,因为它只是部分匹配。我正在寻找的完全匹配将是“foo bar baz”。
以下是代码,感谢WhiteFang34在上面的链接中发布此内容(我只是转换为c#):
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
using Lucene.Net.Documents;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Analysis;
using Lucene.Net.Store;
using Lucene.Net.Index;
namespace LuceneStatic
{
public static class LuceneStatic
{
public static void LucenePhraseQuery()
{
// setup Lucene to use an in-memory index
Lucene.Net.Store.Directory directory = new RAMDirectory();
Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
var mlf = Lucene.Net.Index.IndexWriter.MaxFieldLength.UNLIMITED;
IndexWriter writer = new IndexWriter(directory, analyzer, true, mlf);
// index a few documents
writer.AddDocument(createDocument("1", "foo bar baz"));
writer.AddDocument(createDocument("2", "red green blue"));
writer.AddDocument(createDocument("3", "test foo bar test"));
writer.Close();
// search for documents that have "foo bar" in them
String sentence = "foo bar";
IndexSearcher searcher = new IndexSearcher(directory, true);
PhraseQuery query = new PhraseQuery();
string[] words = sentence.Split(' ');
foreach (var word in words)
{
query.Add(new Term("contents", word));
}
// display search results
List<string> results = new List<string>();
TopDocs topDocs = searcher.Search(query, 10);
foreach (ScoreDoc scoreDoc in topDocs.ScoreDocs)
{
Document doc = searcher.Doc(scoreDoc.doc);
results.Add(doc.Get("contents"));
}
}
private static Document createDocument(string id, string content)
{
Document doc = new Document();
doc.Add(new Field("id", id, Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("contents", content, Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
return doc;
}
}
}
我使用不同的分析仪和不同的方法来解决这个问题,但我无法获得所需的结果。我需要匹配完整短语“foo bar baz”,但“foo bar”应不返回任何匹配。
答案 0 :(得分:3)
创建字段时使用Field.Index.NOT_ANALYZED
参数索引数据。这将导致整个值被索引为单个Term
。
然后,您可以使用简单的TermQuery进行搜索。