NEST自定义分析器NGram未返回正确的结果

时间:2017-11-21 14:16:31

标签: elasticsearch elasticsearch-5

尝试创建自动完成功能,我认为它有效但我注意到我的自定义分析器有时会返回奇怪的结果。

var response = this.client.CreateIndex(
                    ElasticConfig.IndexName,
                    index => index
                        .Mappings(
                            ms => ms.Map<EmployeeDocument>(
                                m => m.Properties(
                                    p => p
                                        .Text(t => t.Name(n => n.EmpFirstName).Analyzer("auto-complete").Fields(ff => ff.Keyword(k => k.Name("keyword"))))
                                        .Text(t => t.Name(n => n.pkEmpID).Analyzer("auto-complete").Fields(ff => ff.Keyword(k => k.Name("keyword"))))
                                        .Text(t => t.Name(n => n.Description).Analyzer("auto-complete").Fields(ff => ff.Keyword(k => k.Name("keyword")))))))
                            .Settings(
                            f => f.Analysis(
                                    analysis => analysis
                                    .Tokenizers(
                                        tokenizers => 
                                        tokenizers
                                            .EdgeNGram("ngram", t => t.MinGram(3).MaxGram(5)))
                                   .Analyzers(
                                        analyzers => analyzers.Custom(
                                            "auto-complete",
                                            a => a.Filters(new List<string> { "lowercase", "ngram" }).Tokenizer("standard")))))); 

如果我打电话

127.0.0.1:9200/default-index/_analyze?text=dan&analyzer=auto-complete

然后我得到了

{
    "tokens": [
        {
            "token": "d",
            "start_offset": 0,
            "end_offset": 3,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "da",
            "start_offset": 0,
            "end_offset": 3,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "a",
            "start_offset": 0,
            "end_offset": 3,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "an",
            "start_offset": 0,
            "end_offset": 3,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "n",
            "start_offset": 0,
            "end_offset": 3,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}

我将我的MinGram设置为3,所以上面肯定是错的,我错过了设置吗?

1 个答案:

答案 0 :(得分:0)

好的,所以我改变了创建令牌的方式。请注意我如何使用TokenFilters代替Tokenizers

var response = this.client.CreateIndex(
                    ElasticConfig.IndexName,
                    index => index.Mappings(
                        ms => ms.Map<EmployeeDocument>(
                            m => m.Properties(
                                p => p
                                    .Text(t => t.Name(n => n.EmpFirstName).Analyzer("auto-complete").Fields(ff => ff.Keyword(k => k.Name("keyword"))))
                                    .Text(t => t.Name(n => n.pkEmpID).Analyzer("auto-complete-id").Fields(ff => ff.Keyword(k => k.Name("keyword"))))
                                    .Text(t => t.Name(n => n.Description).Analyzer("auto-complete").Fields(ff => ff.Keyword(k => k.Name("keyword")))))))
                        .Settings(f => f.Analysis(
                            analysis => analysis
                                .Analyzers(
                                    analyzers => analyzers
                                        .Custom("auto-complete", a => a.Tokenizer("standard").Filters("lowercase", "auto-complete-filter"))
                                        .Custom("auto-complete-id", a => a.Tokenizer("standard").Filters("lowercase", "auto-complete-id-filter")))
                                        .TokenFilters(tokenFilter => tokenFilter
                                                                    .EdgeNGram("auto-complete-filter", t => t.MinGram(3).MaxGram(5))
                                                                    .EdgeNGram("auto-complete-id-filter", t => t.MinGram(1).MaxGram(5))))));