如何在QueryParser中使用ASCIIFoldingFilter

时间:2013-11-19 12:48:42

标签: lucene lucene.net

我正在运行最新版本的Lucene.Net(3.0.3)。 (我还标记了lucene,因为它基本上是相同的架构......)

我关注Lucene.Net.Analysis.Analyzer课程:

public sealed class LowerCaseKeywordAnalyzer : Lucene.Net.Analysis.KeywordAnalyzer
{
    public override TokenStream TokenStream(string fieldName,
                                            TextReader reader)
    {
        var keywordTokenizer = base.TokenStream(fieldName,
                                                reader);
        var asciiFoldingFilter = new ASCIIFoldingFilter(keywordTokenizer);
        var lowerCaseFilter = new LowerCaseFilter(asciiFoldingFilter);

        return lowerCaseFilter;
    }
}

除了不同的情况,此分析器会删除任何特殊字符 - 例如Außendienst变为aussendienst

现在我想用“prefixQuery”搜索这个字段(我之前尝试过Lucene.Net.Search.PrefixQuery,但是这个类不允许任何注入分析器)。我现在这样做:

var escapedLowerCaseSearchPattern = QueryParser.Escape(searchPattern);
var prefixEscapedLowerCaseSearchPattern = string.Concat(escapedLowerCaseSearchPattern,
                                                        "*");
var queryParser = new QueryParser(/* my lucene version*/,
                                  fieldName,
                                  /* a reference to a static instance of my LowerCaseKeywordAnalyzer */);
var query = queryParser.Parse(prefixEscapedLowerCaseSearchPattern);

第一个测试用例

searchPattern: Auß
fieldName: Test

实际:

{Test:auß*}

预期:

{Test:auss*}

第二个测试用例

searchPattern: Auß test
fieldName: Test

实际:

{Test:auß Test:test*}

预期:

{Test:auss test*}

那么,我如何利用LowerCaseKeywordAnalyzer Lucene.Net.QueryParsers.QueryParser来获得预期的结果呢? (或者还有其他解决方案??)

1 个答案:

答案 0 :(得分:0)

好吧,我试过这个:

var escapedLowerCaseSearchPattern = QueryParser.Escape(searchPattern);
var prefixEscapedLowerCaseSearchPattern = string.Concat("\"",
                                                        escapedLowerCaseSearchPattern,
                                                        "*\"");
var queryParser = new QueryParser(/* my lucene version */,
                                  fieldName,
                                  /* a reference to a static instance of my LowerCaseKeywordAnalyzer */);
var query = queryParser.Parse(prefixEscapedLowerCaseSearchPattern);

这会生成非常有效的查询

{Test:auss*}

但确实不起作用......

我记得当我使用非{umlaut searchPatterns Lucene.Net.Search.PrefixQuery时,我得到了结果...
然后,我想......好吧......只需使用我Lucene.Net.Index.Term - Lucene.Net.Search.TermQuery中的Lucene.Net.Search.PrefixQueryvar escapedLowerCaseSearchPattern = QueryParser.Escape(searchPattern); var prefixEscapedLowerCaseSearchPattern = string.Concat("\"", escapedLowerCaseSearchPattern, "\""); var queryParser = new QueryParser(/* my lucene version */, fieldName, /* a reference to a static instance of my LowerCaseKeywordAnalyzer */); var termQuery = (TermQuery) queryParser.Parse(prefixEscapedLowerCaseSearchPattern); var term = termQuery.Term; var prefixQuery = new PrefixQuery(term);

{Test:auss*}

BOOOM!

这会生成相同的查询({{1}}),但不知何故会产生结果......我不知道为什么,但是......