我正在运行最新版本的Lucene.Net(3.0.3)。 (我还标记了lucene
,因为它基本上是相同的架构......)
我关注Lucene.Net.Analysis.Analyzer
课程:
public sealed class LowerCaseKeywordAnalyzer : Lucene.Net.Analysis.KeywordAnalyzer
{
public override TokenStream TokenStream(string fieldName,
TextReader reader)
{
var keywordTokenizer = base.TokenStream(fieldName,
reader);
var asciiFoldingFilter = new ASCIIFoldingFilter(keywordTokenizer);
var lowerCaseFilter = new LowerCaseFilter(asciiFoldingFilter);
return lowerCaseFilter;
}
}
除了不同的情况,此分析器会删除任何特殊字符 - 例如Außendienst
变为aussendienst
。
现在我想用“prefixQuery”搜索这个字段(我之前尝试过Lucene.Net.Search.PrefixQuery
,但是这个类不允许任何注入分析器)。我现在这样做:
var escapedLowerCaseSearchPattern = QueryParser.Escape(searchPattern);
var prefixEscapedLowerCaseSearchPattern = string.Concat(escapedLowerCaseSearchPattern,
"*");
var queryParser = new QueryParser(/* my lucene version*/,
fieldName,
/* a reference to a static instance of my LowerCaseKeywordAnalyzer */);
var query = queryParser.Parse(prefixEscapedLowerCaseSearchPattern);
第一个测试用例
searchPattern: Auß
fieldName: Test
实际:
{Test:auß*}
预期:
{Test:auss*}
第二个测试用例
searchPattern: Auß test
fieldName: Test
实际:
{Test:auß Test:test*}
预期:
{Test:auss test*}
那么,我如何利用LowerCaseKeywordAnalyzer
Lucene.Net.QueryParsers.QueryParser
来获得预期的结果呢? (或者还有其他解决方案??)
答案 0 :(得分:0)
好吧,我试过这个:
var escapedLowerCaseSearchPattern = QueryParser.Escape(searchPattern);
var prefixEscapedLowerCaseSearchPattern = string.Concat("\"",
escapedLowerCaseSearchPattern,
"*\"");
var queryParser = new QueryParser(/* my lucene version */,
fieldName,
/* a reference to a static instance of my LowerCaseKeywordAnalyzer */);
var query = queryParser.Parse(prefixEscapedLowerCaseSearchPattern);
这会生成非常有效的查询
{Test:auss*}
但确实不起作用......
我记得当我使用非{umlaut searchPatterns Lucene.Net.Search.PrefixQuery
时,我得到了结果...
然后,我想......好吧......只需使用我Lucene.Net.Index.Term
- Lucene.Net.Search.TermQuery
中的Lucene.Net.Search.PrefixQuery
来var escapedLowerCaseSearchPattern = QueryParser.Escape(searchPattern);
var prefixEscapedLowerCaseSearchPattern = string.Concat("\"",
escapedLowerCaseSearchPattern,
"\"");
var queryParser = new QueryParser(/* my lucene version */,
fieldName,
/* a reference to a static instance of my LowerCaseKeywordAnalyzer */);
var termQuery = (TermQuery) queryParser.Parse(prefixEscapedLowerCaseSearchPattern);
var term = termQuery.Term;
var prefixQuery = new PrefixQuery(term);
:
{Test:auss*}
BOOOM!
这会生成相同的查询({{1}}),但不知何故会产生结果......我不知道为什么,但是......