如何使用shingle token elasticsearch从string开始

时间:2015-11-29 03:51:46

标签: elasticsearch

我想使用Shingle令牌来分析字符串“快速的棕色狐狸跳过懒狗”进入:

1,

2.快速

...

<磷>氮。快速的棕色狐狸跳过懒狗

我需要帮助。 感谢。

1 个答案:

答案 0 :(得分:0)

通过使用以下索引设置,我们使用木瓦标记过滤器创建自定义分析器,您将能够生成您期望的术语:

curl -XPUT localhost:9200/your_index -d '{
  "settings": {
    "index": {
      "number_of_shards": "5",
      "number_of_replicas": "1",
      "analysis": {
        "analyzer": {
          "my_shingles": {
            "tokenizer": "standard",
            "filter": [
              "shingles"
            ]
          }
        },
        "filter": {
          "shingles": {
            "type": "shingle",
            "min_shingle_size": 2,
            "max_shingle_size": 10
          }
        }
      }
    }
  },
  "mappings": {
    "your_type": {
      "properties": {
        "field": {
          "type": "string",
          "analyzer": "my_shingles"
        }
      }
    }
  }
}'

然后,我们可以要求_analyze端点显示它如何标记你的句子:

 curl -XGET 'localhost:9200/your_index/_analyze?analyzer=my_shingles&pretty' -d 'The quick brown fox jumps over the lazy dog'

回复将是

{
  "tokens" : [ {
    "token" : "The",
    "start_offset" : 0,
    "end_offset" : 3,
    "type" : "<ALPHANUM>",
    "position" : 1
  }, {
    "token" : "The quick",
    "start_offset" : 0,
    "end_offset" : 9,
    "type" : "shingle",
    "position" : 1
  }, {
    "token" : "The quick brown",
    "start_offset" : 0,
    "end_offset" : 15,
    "type" : "shingle",
    "position" : 1
  }, {
    "token" : "The quick brown fox",
    "start_offset" : 0,
    "end_offset" : 19,
    "type" : "shingle",
    "position" : 1
  }, {
    "token" : "The quick brown fox jumps",
    "start_offset" : 0,
    "end_offset" : 25,
    "type" : "shingle",
    "position" : 1
  }, {
    "token" : "The quick brown fox jumps over",
    "start_offset" : 0,
    "end_offset" : 30,
    "type" : "shingle",
    "position" : 1
  }, {
    "token" : "The quick brown fox jumps over the",
    "start_offset" : 0,
    "end_offset" : 34,
    "type" : "shingle",
    "position" : 1
  }, {
    "token" : "The quick brown fox jumps over the lazy",
    "start_offset" : 0,
    "end_offset" : 39,
    "type" : "shingle",
    "position" : 1
  }, {
    "token" : "The quick brown fox jumps over the lazy dog",
    "start_offset" : 0,
    "end_offset" : 43,
    "type" : "shingle",
    "position" : 1
  }, {
  ...

你还会注意到会产生更多的带状疱疹,但上面的那些确实符合你的期望。