修改
好的,我把它解决了。当我一次向PhraseQuery对象添加一个术语时,它保留了常用词。如果是“The”。
我所做的是使用QueryParser对象来解析查询(包括引号)。这会删除常用单词,而短语查询现在就像魅力一样。
List<string> searchList = Regex.Matches(searchTerms, @"(?<match>\w+)|\""(?<match>[\w\s]*)""")
.Cast<Match>()
.Select(m => m.Groups["match"].Value)
.ToList();
QueryParser parser = new QueryParser(LuceneFields.BODY, Analyzer);
BooleanQuery booleanQuery = new BooleanQuery();
// go through each term
foreach (string term in searchList)
{
Query query = null;
if (term.Contains(" ")) // multi word phrase
query = parser.Parse("\"" + term + "\"");
else
query = parser.Parse(term);
if (query.ToString() != "")
booleanQuery.Add(query, BooleanClause.Occur.MUST);
}
我正在使用Lucene.NET创建一个简单的搜索,我在使短语搜索正常工作时遇到了一些麻烦,因为我将它与布尔查询结合起来。
以下代码用于搜索:
List<string> searchList = Regex.Matches(searchTerms, @"(?<match>\w+)|\""(?<match>[\w\s]*)""")
.Cast<Match>()
.Select(m => m.Groups["match"].Value)
.ToList();
QueryParser parser = new QueryParser(LuceneFields.BODY, Analyzer);
BooleanQuery booleanQuery = new BooleanQuery();
// go through each term
foreach (string term in searchList)
{
Query query = null;
if (term.Contains(" ")) // multi word phrase
{
query = new PhraseQuery();
foreach (string str in term.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries))
{
((PhraseQuery)query).Add(new Term(LuceneFields.BODY, str));
}
}
else
query = parser.Parse(term);
string strQuery = query.ToString();
if (query.ToString() != "")
booleanQuery.Add(query, BooleanClause.Occur.MUST);
}
我已经检查了正在创建的查询,看起来没问题:
+body:"The following table"
我还确认此文本实际上在Lucene索引中,您可以从搜索结果中看到只搜索“table”
我真的不知道可能出现什么问题。
我使用以下代码创建索引:
Directory = FSDirectory.Open(new System.IO.DirectoryInfo(IndexDirectory));
Analyzer = new StandardAnalyzer(Version);
using (IndexWriter indexWriter = new IndexWriter(Directory, Analyzer, new IndexWriter.MaxFieldLength(Int32.MaxValue)))
{
Response.Write("Adding document...");
Document document = new Document();
// Store the IDDataContent
document.Add(new Field(LuceneFields.ID, id.ToString(), Field.Store.YES, Field.Index.ANALYZED));
// store the url to the file itself
document.Add(new Field(LuceneFields.HREF, FileURL, Field.Store.YES, Field.Index.ANALYZED));
//document.Add(new Field(LuceneFields.TITLE, Article.Title, Field.Store.YES, Field.Index.ANALYZED));
// store the text of the PDF
document.Add(new Field(LuceneFields.BODY, PdfContents, Field.Store.YES, Field.Index.ANALYZED));
indexWriter.AddDocument(document);
indexWriter.Optimize();
indexWriter.Commit();
}
答案 0 :(得分:0)
答案 1 :(得分:0)
尝试使用whitespaceanalyzer,因为标准分析器在这种情况下不起作用