Elasticsearch how to remove leading zeros from string

时间:2015-09-14 16:17:15

标签: elasticsearch

I am trying to remove leading zeros from data inserted into Elasticsearch, but the data needs to be handled as string and not as number. For example, "1234", "01234" and "01234test" should all be handled. In this example, searching for "1234" should return 2 results.How can I achieve this? Is there a filter or char_filter I can use in the following mapping script?

{  
   "settings":{  
      "analysis":{  
         "analyzer":{
            "diacritical":{  
               "type":"custom",
               "tokenizer":"standard",
               "filter":[  
                  "standard",
                  "lowercase",
                  "asciifolding",
                  "nfd_normalizer"
               ]
            }
         },
         "filter":{  
            "nfd_normalizer":{  
               "type":"icu_normalizer",
               "name":"nfc"
            }
         }
      }
   },
   "mappings":{  
      "testType":{  
         "_timestamp":{  
            "enabled":"true",
            "store":"yes"
         },
         "properties":{  
            "mynumber":{  
               "store":"yes",
               "type":"string",
               "index":"analyzed",
               "analyzer":"diacritical"
            }
         }
      }
   }
}

1 个答案:

答案 0 :(得分:2)

一种方法是构造一个pattern replace filter,它可以处理来自标准标记器的标记。

这些行上的某些内容适用于OP中的示例:

"leading_zero_trim":{
    "type":"pattern_replace",
    "pattern":"^0+(.*)",
    "replacement":"$1"
}

示例设置:

{
   "settings": {
      "analysis": {
         "analyzer": {
            "diacritical": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "standard",
                  "lowercase",
                  "asciifolding",
                  "nfd_normalizer",
                  "leading_zero_trim",
                  "trim_zero_length"
               ]
            }
         },
         "filter": {
            "nfd_normalizer": {
               "type": "icu_normalizer",
               "name": "nfc"
            },
            "leading_zero_trim": {
               "type": "pattern_replace",
               "pattern": "^0+(.*)",
               "replacement": "$1"
            },
            "trim_zero_length": {
               "type": "length",
               "min": 1
            }
         }
      }
   }
}

测试分析器:

get <index_name>/_analyze?analyzer=diacritical&text=hello omarta 01234 12340 123404 0001 000 0123test