使用Elasticsearch 2.2,作为一个简单的实验,我想从任何以小写字符“s”结尾的单词中删除最后一个字符。例如,单词“sounds”将被索引为“sound”。
我正在定义我的分析器:
{
"template": "document-index-template",
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"sFilter": {
"type": "pattern_replace",
"pattern": "([a-zA-Z]+)([s]( |$))",
"replacement": "$2"
}
},
"analyzer": {
"tight": {
"type": "standard",
"filter": [
"sFilter",
"lowercase"
]
}
}
}
}
}
然后,当我使用此请求分析“沉默的声音”一词时:
<index>/_analyze?analyzer=tight&text=sounds%20of%20silences
我明白了:
{
"tokens": [
{
"token": "sounds",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "of",
"start_offset": 7,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "silences",
"start_offset": 10,
"end_offset": 18,
"type": "<ALPHANUM>",
"position": 2
}
]
}
我期待“声音”是“声音”而“沉默”是“沉默”
答案 0 :(得分:3)
上述分析器设置无效。我认为您打算使用的是custom类型的分析器,其中tokenizer设置为standard
示例:
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"sFilter": {
"type": "pattern_replace",
"pattern": "([a-zA-Z]+)s",
"replacement": "$1"
}
},
"analyzer": {
"tight": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"sFilter"
]
}
}
}
}
}