C#NEST Elasticsearch自定义过滤器结构(tokenize)

时间:2017-12-05 02:34:32

标签: c# filter token nest full-text-indexing

我正在尝试将此特定查询重写为C#NEST,但我坚持定义过滤器...我很困惑......

{  
   "settings":{  
      "analysis":{  
         "filter":{  
            "lemmagen_filter_sk":{  
               "type":"lemmagen",
               "lexicon":"sk"
            },
            "synonym_filter":{  
               "type":"synonym",
               "synonyms_path":"synonyms/sk_SK.txt",
               "ignore_case":true
            },
            "stopwords_SK":{  
               "type":"stop",
               "stopwords_path":"stop-­‐words/stop­‐words-­slovak.txt",
               "ignore_case":true
            }
         },
        "analyzer":{  
            "slovencina_synonym":{  
               "type":"custom",
               "tokenizer":"standard",
               "filter":[  
                  "stopwords_SK",
                  "lemmagen_filter_sk",
                  "lowercase",
                  "stopwords_SK",
                  "synonym_filter",
                  "asciifolding"
               ]
            },
            "slovencina":{  
               "type":"custom",
               "tokenizer":"standard",
               "filter":[  
                  "stopwords_SK",
                  "lemmagen_filter_sk",
                  "lowercase",
                  "stopwords_SK",
                  "asciifolding"
               ]
            },

我希望有正确的client.CreateIndex(...)命令和正确的索引设置。 我现在所拥有的只是:

client.CreateIndex(indexName, c => c
    .InitializeUsing(indexConfig)
    .Mappings(m => m
        .Map<T>(mp => mp.AutoMap())));

我找不到任何信息如何做到这一点。 我会感激任何帮助。

编辑:

client.CreateIndex(indexName, c => c
                .InitializeUsing(indexConfig)
                .Settings(s => s
                    .Analysis(a => a
                        .TokenFilters(t => t
                            .UserDefined("lemmagen_filter_sk",
                                new LemmagenTokenFilter { Lexicon = "sk" })
                            .Synonym("synonym_filter", ts => ts
                                .SynonymsPath("synonyms/sk_SK.txt")
                                .IgnoreCase(true))
                            .Stop("stopwords_sk", tst => tst
                                .StopWordsPath("stop-words/stop-words-slovak")
                                .IgnoreCase(true))
                         )
                         .Analyzers(aa => aa
                            .Custom("slovencina_synonym", acs => acs
                            .Tokenizer("standard")
                            .Filters("stopwords_SK", "lemmagen_filter_sk", "lowercase", "stopwords_SK", "synonym_filter", "asciifolding")
                            )
                            .Custom("slovencina", acs => acs
                            .Tokenizer("standard")
                            .Filters("stopwords_SK", "lemmagen_filter_sk", "lowercase", "stopwords_SK", "asciifolding")
                            )
                         )
                     )
                 )
                .Mappings(m => m
                    .Map<DealItem>(mp => mp.AutoMap()
                    .Properties(p => p
                        .Text(t => t
                            .Name(n => n.title_dealitem)
                            .Name(n => n.coupon_text1)
                            .Name(n => n.coupon_text2)
                            .Analyzer("slovencina_synonym")
            )
        ))));

这就是我现在所拥有的,但是在尝试使用一个

后我得到了ERROR
POST dealitems/_analyze
{
  "analyzer": "slovencina",
  "text":     "Janko kúpil nové topánky"
}

ERROR:

{
  "error": {
    "root_cause": [
      {
        "type": "remote_transport_exception",
        "reason": "[myNode][127.0.0.1:9300][indices:admin/analyze[s]]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "failed to find analyzer [slovencina]"
  },
  "status": 400
}

和GET _settings没有显示任何分析器

结果:问题在于丢失文件......错误的路径

1 个答案:

答案 0 :(得分:3)

实际上,NEST中没有开箱即用的lemmagen令牌过滤器。希望您可以轻松创建自己的:

public class LemmagenTokenFilter : ITokenFilter
{
    public string Version { get; set; }
    public string Type => "lemmagen";
    [JsonProperty("lexicon")]
    public string Lexicon { get; set; }
}


var response = elasticClient.CreateIndex(_defaultIndex,
    d => d.Settings(s => s
        .Analysis(a => a
            .TokenFilters(t => t.UserDefined("lemmagen_filter_sk",
                new LemmagenTokenFilter
                {
                    Lexicon = "sk"
                }))))
                ..
                );

希望有所帮助。