ElasticSearch自动完成字符串中的关键字

时间:2015-10-03 00:01:17

标签: autocomplete elasticsearch

我的文档如下:

   "hits": {
      "total": 4,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_db2",
            "_type": "test",
            "_id": "1",
            "_score": 1,
            "_source": {
               "name": "very cool shoes",
               "price": 26
            }
         },
         {
            "_index": "test_db2",
            "_type": "test",
            "_id": "2",
            "_score": 1,
            "_source": {
               "name": "great shampoo",
               "price": 15
            }
         },
         {
            "_index": "test_db2",
            "_type": "test",
            "_id": "3",
            "_score": 1,
            "_source": {
               "name": "shirt",
               "price": 25
            }
         }
      ]
    }

如何在elasticsearch中创建自动完成功能,例如: 我输入了输入字" sh"之后我应该看到结果

  

SH OES

     

SH ampoo

     

SH IRT

.....

Example of what I need

1 个答案:

答案 0 :(得分:0)

看看ngrams。或者实际上,edge ngrams可能就是你所需要的。

Qbox有一些关于使用ngrams设置自动完成功能的博客文章,因此,为了进行更深入的讨论,我会向您推荐这些:

https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch

https://qbox.io/blog/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams

但是很快,这应该让你开始。

首先我设置索引:

PUT /test_index
{
   "settings": {
      "analysis": {
         "analyzer": {
            "autocomplete": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "standard",
                  "stop",
                  "kstem",
                  "edgengram_filter"
               ]
            }
         },
         "filter": {
            "edgengram_filter": {
               "type": "edgeNGram",
               "min_gram": 2,
               "max_gram": 15
            }
         }
      }
   },
   "mappings": {
      "doc": {
         "properties": {
            "name": {
               "type": "string",
               "index_analyzer": "autocomplete",
               "search_analyzer": "standard"
            },
            "price":{
                "type": "integer"
            }
         }
      }
   }
}

然后我将你的文件编入索引:

POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"name": "very cool shoes","price": 26}
{"index":{"_id":2}}
{"name": "great shampoo","price": 15}
{"index":{"_id":3}}
{"name": "shirt","price": 25}

现在,我可以通过简单的match query

获得自动填充结果
POST /test_index/_search
{
   "query": {
      "match": {
         "name": "sh"
      }
   }
}

返回:

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 0.30685282,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "3",
            "_score": 0.30685282,
            "_source": {
               "name": "shirt",
               "price": 25
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.19178301,
            "_source": {
               "name": "great shampoo",
               "price": 15
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 0.15342641,
            "_source": {
               "name": "very cool shoes",
               "price": 26
            }
         }
      ]
   }
}

以下是我用来测试它的代码:

http://sense.qbox.io/gist/0886488ddfb045c69eed67b15e9734187c8b2491