Elasticsearch自定义分析器被忽略

时间:2016-02-22 22:43:29

标签: elasticsearch analyzer

我正在使用Elasticsearch 2.2.0,并尝试在字段上使用lowercase + asciifolding过滤器。

这是http://localhost:9200/myindex/

的输出
{
    "myindex": {
        "aliases": {}, 
        "mappings": {
            "products": {
                "properties": {
                    "fold": {
                        "analyzer": "folding", 
                        "type": "string"
                    }
                }
            }
        }, 
        "settings": {
            "index": {
                "analysis": {
                    "analyzer": {
                        "folding": {
                            "token_filters": [
                                "lowercase", 
                                "asciifolding"
                            ], 
                            "tokenizer": "standard", 
                            "type": "custom"
                        }
                    }
                }, 
                "creation_date": "1456180612715", 
                "number_of_replicas": "1", 
                "number_of_shards": "5", 
                "uuid": "vBMZEasPSAyucXICur3GVA", 
                "version": {
                    "created": "2020099"
                }
            }
        }, 
        "warmers": {}
    }
}

当我尝试使用folding API测试_analyze自定义过滤器时,这是http://localhost:9200/myindex/_analyze?analyzer=folding&text=%C3%89sta%20est%C3%A1%20loca的输出

{
    "tokens": [
        {
            "end_offset": 4, 
            "position": 0, 
            "start_offset": 0, 
            "token": "Ésta", 
            "type": "<ALPHANUM>"
        }, 
        {
            "end_offset": 9, 
            "position": 1, 
            "start_offset": 5, 
            "token": "está", 
            "type": "<ALPHANUM>"
        }, 
        {
            "end_offset": 14, 
            "position": 2, 
            "start_offset": 10, 
            "token": "loca", 
            "type": "<ALPHANUM>"
        }
    ]
}

如您所见,返回的令牌为:Éstaestáloca ,而不是 estaestaloca。发生了什么事?这个折叠分析仪似乎被忽略了。

1 个答案:

答案 0 :(得分:1)

在创建索引时看起来很简单。

"analysis":{"analyzer":{...}}块中,这个:

"token_filters": [...]

应该是

"filter": [...]

检查the documentation以确认此事。由于您的过滤器阵列未正确命名,因此ES完全忽略它,并且决定使用standard分析器。这是一个使用Sense chrome插件编写的小例子。按顺序执行:

DELETE /test

PUT /test
{
      "analysis": {
         "analyzer": {
            "folding": {
               "type": "custom",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ],
               "tokenizer": "standard"
            }
         }
      }
}

GET /test/_analyze
{
    "analyzer":"folding",
    "text":"Ésta está loca"
}

最后GET /test/_analyze的结果:

"tokens": [
      {
         "token": "esta",
         "start_offset": 0,
         "end_offset": 4,
         "type": "<ALPHANUM>",
         "position": 0
      },
      {
         "token": "esta",
         "start_offset": 5,
         "end_offset": 9,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "loca",
         "start_offset": 10,
         "end_offset": 14,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]