ngram elasticsearch

时间:2016-09-23 06:58:11

标签: elasticsearch elasticsearch-plugin analyzer n-gram

curl -XPUT 'http://localhost:9200/testsoundi' -d '{  "settings": {
  "analysis": {
  "analyzer": {
  "my_edge_ngram_analyzer": {
  "tokenizer": "my_edge_ngram_tokenizer"
  }
  },
  "tokenizer": {
  "my_edge_ngram_tokenizer": {
  "type": "edgeNGram", "min_gram" : "2", "max_gram" : "5",
  "token_chars": ["letter", "digit","whitespace"]
  }
  }
  }
  }
  }'


soundarya@soundarya-VirtualBox:~/Downloads/elasticsearch-2.4.0/bin$ curl 'localhost:9200/testsoundi/_analyze?pretty=1&analyzer=my_edge_ngram_analyzer' -d 'wonder'

但是我得到输出为wo,won,wond等。如果我将max_gram设为3,我只得到第三个字母('wo','won')

I am expecting output like:

won 
ond
nde
der

任何人都可以帮我吗?

1 个答案:

答案 0 :(得分:1)

干得好,你快到了。首先,您需要一个nGram标记生成器,而不是edgeNGram。不同之处在于后者只会从单词的开头标记,而前者将创建所需长度的所有可能标记,而不管单词中的位置。

其次,如果您需要长度为3的令牌,min_grammax_gram需要为3。

curl -XPUT 'http://localhost:9200/testsoundi' -d '{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_ngram_analyzer": {
          "tokenizer": "my_ngram_tokenizer"
        }
      },
      "tokenizer": {
        "my_ngram_tokenizer": {
          "type": "nGram",
          "min_gram": "3",
          "max_gram": "3",
          "token_chars": [
            "letter",
            "digit",
            "whitespace"
          ]
        }
      }
    }
  }
}