如何在elasticsearch中获得单词三元组

时间:2014-04-29 06:14:31

标签: curl lucene elasticsearch n-gram

我一直试图用弹性搜索标记符来获取三元组。我已经按照http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.htmlhttp://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams

上的教程进行了操作

遵循这些文档并使用

测试分析仪

curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_ngram_analyzer' -d 'FC Schalke 04'

生成像# FC, Sc, Sch, ch, cha, ha, hal, al, alk, lk, lke, ke, 04

这样的nGrams

虽然我想要的是全字三卦

例如the quick red fox jumps over the lazy brown dog的三元组将是。

the quick red
quick red fox
red fox jumps
fox jumps over
jumps over the
over the lazy
the lazy brown
lazy brown dog

简而言之,如何使用elasticsearch

创建上述图表

1 个答案:

答案 0 :(得分:3)

找到它。答案在于木瓦过滤器。这种映射使其有效

{
   "settings": {
      "analysis": {
         "filter": {
            "nGram_filter": {
               "type": "shingle",
               "max_shingle_size": 3,
               "min_shingle_size": 3,
               output_unigrams:false
            }
         },
         "analyzer": {
            "nGram_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding",
                  "nGram_filter"
               ]
            },
            "whitespace_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ]
            }
         }
      }
   }
}

这里的关键属性是type-> shingle和min / max shingle大小。