如何在ES过滤器中选择最长的令牌

时间:2019-09-23 08:02:35

标签: elasticsearch

输入是一个人名列表,我想创建一个有点模糊的完全匹配。

索引文字为冯宝安,我的分析器位于下方

PUT trim
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "word_joiner": {
            "type": "shingle",
            "output_unigrams": false,
            "token_separator": "",
            "output_unigrams_if_no_shingles": true,
            "max_shingle_size": 5
          }
        },
        "analyzer": {
          "word_join_analyzer": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "word_joiner"
            ]
          }
        },
        "tokenizer": {}
      }
    }
  }
}

它将生成三个令牌

{
  "tokens": [
    {
      "token": "baoan",
      "start_offset": 0,
      "end_offset": 6,
      "type": "shingle",
      "position": 0
    },
    {
      "token": "baoanfeng",
      "start_offset": 0,
      "end_offset": 11,
      "type": "shingle",
      "position": 0,
      "positionLength": 2
    },
    {
      "token": "anfeng",
      "start_offset": 4,
      "end_offset": 11,
      "type": "shingle",
      "position": 1
    }
  ]
}

我只想要“宝安峰”,我不能使用“ min_shingle_size”,因为可以输入两个单词。

1 个答案:

答案 0 :(得分:0)

如果您需要的是最长的带状疱疹,我不确定为什么要使用componentDidUpdate(prevProps) { // Typical usage (don't forget to compare props): if (this.state.model !== prevProps.formModel) { this.state({ model : prevProps.formModel}) } } 过滤器...

为什么不简单地使用带有模式过滤器的shingle标记程序来删除所有非字符的字符呢?像这样:

keyword

然后对其进行测试:

PUT trim
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "pattern": {
            "type": "pattern_replace",
            "pattern": "\\W+",
            "replacement": ""
          }
        },
        "analyzer": {
          "word_join_analyzer": {
            "type": "custom",
            "tokenizer": "keyword",
            "filter": [
              "lowercase",
              "pattern"
            ]
          }
        },
        "tokenizer": {}
      }
    }
  }
}

结果:

POST trim/_analyze
{
  "analyzer": "word_join_analyzer",
  "text": "Bao-An Feng"
}