Elasticsearch中的自定义预定义停用词列表

时间:2017-01-14 07:42:50

标签: elasticsearch lucene stop-words

如何以可从所有索引访问的方式全局定义自定义禁用词列表。

使用此禁用词列表非常理想,就像我们使用预定义语言特定的禁用词列表一样:

PUT /my_index
{
    "settings": {
        "analysis": {
            "filter": {
                "my_stop": {
                    "type":       "stop",
                    "stopwords":  "_my_predefined_stopword_list_"
                }
            }
        }
    }
}

1 个答案:

答案 0 :(得分:1)

官方弹性文档文档介绍了如何使用停用词列表创建自定义过滤器。你可以在这里找到描述:

https://www.elastic.co/guide/en/elasticsearch/guide/current/using-stopwords.html

PUT /my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "spanish_stop": {
          "type":        "stop",
          "stopwords": [ "si", "esta", "el", "la" ]  
        },
        "light_spanish": { 
          "type":     "stemmer",
          "language": "light_spanish"
        }
      },
      "analyzer": {
        "my_spanish": {
          "tokenizer": "spanish",
          "filter": [ 
            "lowercase",
            "asciifolding",
            "spanish_stop",
            "light_spanish"
          ]
        }
      }
    }
  }
}

定义此过滤器spanish_stop后,您可以在索引的定义中使用它。