返回从ElasticSearch中的字段派生的关键字集

时间:2017-03-24 02:29:54

标签: elasticsearch

我有点新手,我需要帮助,我在网上找不到我想找的答案。基本上,我尝试做的是基于从某些文本域派生的关键字自动完成

给出一个我的指数的例子:

"name": "One liter of Chocolate Milk"
"name": "Milo Milk 250g"
"name": "HiLow low fat milk"
"name": "Yoghurt strawberry"
"name": "Milk Nutrisoy"

因此,当我输入" mi"时,我希望得到如下结果:

"milk"
"milo"
"milo milk"
"chocolate milk" 
etc

非常好的例子是这个aliexpress.com自动完成

提前致谢

1 个答案:

答案 0 :(得分:2)

这似乎是shingle token filter

的一个很好的用例
curl -XPUT localhost:9200/your_index -d '{
  "settings": {
      "analysis": {
        "analyzer": {
          "my_shingles": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "shingles"
            ]
          }
        },
        "filter": {
          "shingles": {
            "type": "shingle",
            "min_shingle_size": 2,
            "max_shingle_size": 2,
            "output_unigrams": true
          }
        }
      }
  },
  "mappings": {
    "your_type": {
      "properties": {
        "field": {
          "type": "string",
          "analyzer": "my_shingles"
        }
      }
    }
  }
}'

如果您使用此分析器分​​析Milo Milk 250g,您将获得以下令牌:

curl -XGET 'localhost:9200/your_index/_analyze?analyzer=my_shingles&pretty' -d 'Milo Milk 250g'

{
  "tokens" : [ {
    "token" : "milo",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "<ALPHANUM>",
    "position" : 0
  }, {
    "token" : "milo milk",
    "start_offset" : 0,
    "end_offset" : 9,
    "type" : "shingle",
    "position" : 0
  }, {
    "token" : "milk",
    "start_offset" : 5,
    "end_offset" : 9,
    "type" : "<ALPHANUM>",
    "position" : 1
  }, {
    "token" : "milk 250g",
    "start_offset" : 5,
    "end_offset" : 14,
    "type" : "shingle",
    "position" : 1
  }, {
    "token" : "250g",
    "start_offset" : 10,
    "end_offset" : 14,
    "type" : "<ALPHANUM>",
    "position" : 2
  } ]
}

因此,在搜索mi时,您将获得以下令牌:

  • 蜀黍
  • milo milk
  • 牛奶250克