Elasticsearch生成建议字段

时间:2015-07-22 13:46:28

标签: elasticsearch

我一直在博客中阅读有关弹性搜索的建议,例如:https://www.elastic.co/blog/you-complete-me

但是你必须在name_suggest数据中加入自己的数据,而不是在映射对象时自动将数据添加到name_suggest

所以更新此映射:

curl -X PUT localhost:9200/hotels -d '
{
  "mappings": {
    "hotel" : {
      "properties" : {
        "name" : { "type" : "string" },
        "city" : { "type" : "string" },
        "name_suggest" : {
          "type" :     "completion"
        }
      } 
    }
  }
}'

以及这些看跌期权:

curl -X PUT localhost:9200/hotels/hotel/1 -d '
{
  "name" :         "Mercure Hotel Munich",
  "city" :         "Munich",
  "name_suggest" : "Mercure Hotel Munich"
}'
curl -X PUT localhost:9200/hotels/hotel/2 -d '
{
  "name" :         "Hotel Monaco",
  "city" :         "Munich",
  "name_suggest" : "Hotel Monaco"
}'
curl -X PUT localhost:9200/hotels/hotel/3 -d '
{
  "name" :         "Courtyard by Marriot Munich City",
  "city" :         "Munich",
  "name_suggest" : "Courtyard by Marriot Munich City"
}'

因此我们可能会丢失name_suggest字段。

因此,最终目标是当您开始输入Ho时,第一个结果将是Hotel

1 个答案:

答案 0 :(得分:0)

如果您希望在单词内部进行部分匹配,则可以使用ngrams;如果您只想从单词的开头匹配,则可以使用edge ngrams

这是一个例子。我设置了这样一个索引:

PUT /test_index
{
    "settings": {
      "analysis": {
         "filter": {
            "edge_ngram_filter": {
               "type": "edge_ngram",
               "min_gram": 2,
               "max_gram": 20
            }
         },
         "analyzer": {
            "edge_ngram_analyzer": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "lowercase",
                  "edge_ngram_filter"
               ]
            }
         }
      }
   },
   "mappings": {
       "doc": {
           "properties": {
               "name": {
                   "type": "string",
                   "index_analyzer": "edge_ngram_analyzer",
                   "search_analyzer": "standard"
               },
               "city": {
                   "type": "string"
               }
           }
       }
   }
}

然后添加了您的文档:

POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"name":"Mercure Hotel Munich","city":"Munich"}
{"index":{"_id":2}}
{"name":"Hotel Monaco","city":"Munich"}
{"index":{"_id":3}}
{"name":"Courtyard by Marriot Munich City","city":"Munich"}

现在,我可以查询名称中包含"hot"的文档,如下所示:

POST /test_index/_search
{
    "query": {
        "match": {
           "name": "hot"
        }
    }
}

我找回了正确的文档:

{
   "took": 41,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.625,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.625,
            "_source": {
               "name": "Hotel Monaco",
               "city": "Munich"
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 0.5,
            "_source": {
               "name": "Mercure Hotel Munich",
               "city": "Munich"
            }
         }
      ]
   }
}

有多种方法可以调整或推广。例如,如果要在多个字段上匹配,可以将ngram分析器应用于_all字段。

以下是我用来测试它的代码:

http://sense.qbox.io/gist/3583de02c4f7d33e07ba4c2def9badf90692a290