我正在尝试将我的搜索功能转换为允许涉及多个单词的模糊搜索。我现有的搜索代码如下:
// Split the search into seperate queries per word, and combine them into one major query
var finalQuery = new BooleanQuery();
string[] terms = searchString.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
foreach (string term in terms)
{
// Setup the fields to search
string[] searchfields = new string[]
{
// Various strings denoting the document fields available
};
var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, searchfields, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
finalQuery.Add(parser.Parse(term), BooleanClause.Occur.MUST);
}
// Perform the search
var directory = FSDirectory.Open(new DirectoryInfo(LuceneIndexBaseDirectory));
var searcher = new IndexSearcher(directory, true);
var hits = searcher.Search(finalQuery, MAX_RESULTS);
这是正常的,如果我有一个名称字段为“我的名字是安德鲁”的实体,并且我执行搜索“安德鲁名字”,Lucene正确找到了正确的文件。现在我想启用模糊搜索,以便正确找到“Anderw Name”。我改变了我的方法以使用以下代码:
const int MAX_RESULTS = 10000;
const float MIN_SIMILARITY = 0.5f;
const int PREFIX_LENGTH = 3;
if (string.IsNullOrWhiteSpace(searchString))
throw new ArgumentException("Provided search string is empty");
// Split the search into seperate queries per word, and combine them into one major query
var finalQuery = new BooleanQuery();
string[] terms = searchString.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
foreach (string term in terms)
{
// Setup the fields to search
string[] searchfields = new string[]
{
// Strings denoting document field names here
};
// Create a subquery where the term must match at least one of the fields
var subquery = new BooleanQuery();
foreach (string field in searchfields)
{
var queryTerm = new Term(field, term);
var fuzzyQuery = new FuzzyQuery(queryTerm, MIN_SIMILARITY, PREFIX_LENGTH);
subquery.Add(fuzzyQuery, BooleanClause.Occur.SHOULD);
}
// Add the subquery to the final query, but make at least one subquery match must be found
finalQuery.Add(subquery, BooleanClause.Occur.MUST);
}
// Perform the search
var directory = FSDirectory.Open(new DirectoryInfo(LuceneIndexBaseDirectory));
var searcher = new IndexSearcher(directory, true);
var hits = searcher.Search(finalQuery, MAX_RESULTS);
不幸的是,使用此代码,如果我提交搜索查询“Andrew Name”(与之前相同),我会得到零结果。
核心思想是必须在至少一个文档字段中找到所有术语,但每个术语可以位于不同的字段中。有没有人知道我的重写查询失败的原因?
<小时/> 最终编辑:好的,事实证明我已经过了很多事情,并且没有必要改变我的第一种方法。在恢复到第一个代码片段后,我通过更改
启用了模糊搜索
finalQuery.Add(parser.Parse(term), BooleanClause.Occur.MUST);
到
finalQuery.Add(parser.Parse(term.Replace("~", "") + "~"), BooleanClause.Occur.MUST);
答案 0 :(得分:3)
如果我将searchString
重写为小写,您的代码将适用于我。我假设您在编制索引时使用StandardAnalyzer
,它会生成小写字词。
您需要1)通过相同的分析器传递您的令牌(以实现相同的处理),2)应用与分析仪相同的逻辑或3)使用与您执行的处理相匹配的分析器(WhitespaceAnalyzer
)
答案 1 :(得分:1)
你想要这一行:
var queryTerm = new Term(term);
看起来像这样:
var queryTerm = new Term(field, term);
现在你正在为空字符串(永远不会找到)搜索字段term
(可能不存在)。