带有uax_url_email tokenizer的Elasticsearch电子邮件分析器会在@分割电子邮件

时间:2018-06-01 15:37:17

标签: elasticsearch

我在Elasticsearch 1.7中有一个电子邮件分析器,我希望将整个字符串处理成电子邮件,而不是以任何方式拆分它们。但是,电子邮件输入会以@字符分割。

这是我的模板

{
  "template": "someindex*",
  "settings": {
    "index.analysis.filter.length-filter.min": "8",
    "index.analysis.analyzer.default.stopwords": "_none_",
    "index.analysis.filter.length-filter.type": "length",
    "index.analysis.filter.length-filter.max": "4999",
    "index.mapper.dynamic": "true",
    "index.analysis.analyzer.default.type": "standard",
    "index.analysis.analyzer.email-analyzer.filter" : ["lowercase","unique"],
    "index.analysis.analyzer.email-analyzer.type" : "custom",
    "index.analysis.tokenizer.email-tokenizer.type" : "uax_url_email",
    "index.analysis.analyzer.email-analyzer.tokenizer" : "email-tokenizer"
  },
  "mappings": {
    "_default_": {
      "properties": {
        "email": {
          "index_analyzer" : "email-analyzer",
          "search_analyzer" : "email-analyzer",
          "type" : "string",
          "fields" : {
            "raw" : {
              "index" : "not_analyzed",
              "ignore_above" : 256,
              "type" : "string"
            }
          }
        }
      },
      "_all": {
        "enabled": true,
        "omit_norms": true
      }
    }
  },
  "aliases": {
    "someindex": {}
  }
}

当我执行此

$ curl -XGET 'http://localhost:9200/someindex/_analyze?analyzer=email-analyzer' -d 'test.me@gmail.com'
{"tokens":[{"token":"test.me","start_offset":0,"end_offset":7,"type":"<ALPHANUM>","position":1},{"token":"gmail.com","start_offset":8,"end_offset":17,"type":"<ALPHANUM>","position":2}]}   

虽然我已为该特定分析器定义了uax_url_email标记器,但我发现电子邮件已被拆分。

我在这里做错了什么?

感谢您的帮助! 人

0 个答案:

没有答案