弹性搜索标记器和过滤器用于拆分给定数据

时间:2015-01-06 10:41:15

标签: java elasticsearch elasticsearch-plugin

我因为分割我的期望输出数据而受到束缚。但我无法得到它。我尝试了所有的Filter和Tokenizer。  我已在弹性搜索中更新了设置,如下所示。

    {
      "settings": {
        "analysis": {
          "filter": {
            "filter_word_delimiter": {
                                "preserve_original": "true",
                                "type": "word_delimiter"
                    }
          },
          "analyzer": {
            "en_us": {
              "tokenizer":  "keyword",
              "filter":   [ "filter_word_delimiter","lowercase" ]
            }

          }
        }
      }
    }

执行查询     curl -XGET "XX.XX.XX.XX:9200/keyword/_analyze?pretty=1&analyzer=en_us" -d 'DataGridControl'

命中值

{
  "tokens" : [ {
    "token" : "datagridcontrol"
    "start_offset" : 0,
    "end_offset" : 16,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "data",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "grid",
    "start_offset" : 4,
    "end_offset" : 8,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "control",
    "start_offset" : 9,
    "end_offset" : 16,
    "type" : "word",
    "position" : 3
  } ]
}

期望结果如 - >     DataGridControl     数据网格     DataControl上     数据     格     控制  什么类型的tokenizer和Filter添加到索引设置。     有什么帮助吗?

1 个答案:

答案 0 :(得分:1)

试试这个:

{
  "settings": {
    "analysis": {
      "filter": {
        "filter_word_delimiter": {
          "type": "word_delimiter"
        },
        "custom_shingle": {
          "type": "shingle",
          "token_separator":"",
          "max_shingle_size":3
        }
      },
      "analyzer": {
        "en_us": {
          "tokenizer": "keyword",
          "filter": [
            "filter_word_delimiter",
            "custom_shingle",
            "lowercase"
          ]
        }
      }
    }
  }
}

让我知道它是否让你更近了。