Elasticsearch - 如何从单词结尾删除s

时间:2016-06-16 18:40:46

标签: elasticsearch elasticsearch-2.0

使用Elasticsearch 2.2,作为一个简单的实验,我想从任何以小写字符“s”结尾的单词中删除最后一个字符。例如,单词“sounds”将被索引为“sound”。

我正在定义我的分析器:

{
  "template": "document-index-template",
  "settings": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "sFilter": {
          "type": "pattern_replace",
          "pattern": "([a-zA-Z]+)([s]( |$))",
          "replacement": "$2"
        }
      },
      "analyzer": {
        "tight": {
          "type": "standard",
          "filter": [
            "sFilter",
            "lowercase"
          ]
        }
      }
    }
  }
}

然后,当我使用此请求分析“沉默的声音”一词时:

<index>/_analyze?analyzer=tight&text=sounds%20of%20silences

我明白了:

{
   "tokens": [
      {
         "token": "sounds",
         "start_offset": 0,
         "end_offset": 6,
         "type": "<ALPHANUM>",
         "position": 0
      },
      {
         "token": "of",
         "start_offset": 7,
         "end_offset": 9,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "silences",
         "start_offset": 10,
         "end_offset": 18,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]
}

我期待“声音”是“声音”而“沉默”是“沉默”

1 个答案:

答案 0 :(得分:3)

上述分析器设置无效。我认为您打算使用的是custom类型的分析器,其中tokenizer设置为standard

示例:

{

  "settings": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "sFilter": {
          "type": "pattern_replace",
          "pattern": "([a-zA-Z]+)s",
          "replacement": "$1"
        }
      },
      "analyzer": {
        "tight": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "sFilter"
          ]
        }
      }
    }
  }
}