Elasticsearch完成建议使用多字输入进行搜索

时间:2015-04-20 17:00:47

标签: elasticsearch autosuggest search-suggestion completion

使用Elasticsearch完成建议器,我在返回与单字查询匹配的多字输入建议时遇到问题。

示例结构:

PUT /test_index/
{
   "mappings": {
      "item": {
         "properties": {
            "test_suggest": {
               "type": "completion",
               "index_analyzer": "whitespace",
               "search_analyzer": "whitespace",
               "payloads": false
            }
         }
      }
   }
}

PUT /test_index/item/1
{
   "test_suggest": {
      "input": [
         "cat dog",
         "elephant"
      ]
   }
}

工作查询:

POST /test_index/_suggest
{
    "test_suggest":{
        "text":"cat",
        "completion": {
            "field" : "test_suggest"
        }
    }
}

结果

{
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "test_suggest": [
      {
         "text": "cat",
         "offset": 0,
         "length": 3,
         "options": [
            {
               "text": "cat dog",
               "score": 1
            }
         ]
      }
   ]
}

查询失败:

POST /test_index/_suggest
{
    "test_suggest":{
        "text":"dog",
        "completion": {
            "field" : "test_suggest"
        }
    }
}

结果

{
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "test_suggest": [
      {
         "text": "dog",
         "offset": 0,
         "length": 3,
         "options": []
      }
   ]
}

我希望与工作查询相同的结果,匹配“猫狗”。任何建议问题是什么以及如何使失败的查询工作?当使用标准分析仪而不是空白分析仪时,我得到相同的结果。我想在每个输入字符串中使用多个单词,如上例所示。

1 个答案:

答案 0 :(得分:11)

完成建议器是prefix suggester,这意味着它会尝试将您的查询与输入的前几个字符相匹配。如果您希望发布的文档与文本“dog”匹配,那么您需要指定“dog”作为输入。

PUT /test_index/item/1
{
   "test_suggest": {
      "input": [
         "cat dog",
         "elephant",
         "dog"
      ]
   }
}

根据我的经验,必须指定输入以匹配的限制使得完成建议器与实现前缀匹配的其他方式相比不那么有用。我为此目的喜欢edge ngrams。我最近写了一篇关于使用ngrams的博客文章,你可能会发现它们很有帮助:http://blog.qbox.io/an-introduction-to-ngrams-in-elasticsearch

作为一个简单示例,这里是您可以使用的映射

PUT /test_index
{
   "settings": {
      "analysis": {
         "filter": {
            "edge_ngram_filter": {
               "type": "edge_ngram",
               "min_gram": 2,
               "max_gram": 20
            }
         },
         "analyzer": {
            "edge_ngram_analyzer": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "lowercase",
                  "edge_ngram_filter"
               ]
            }
         }
      }
   },
   "mappings": {
      "item": {
         "properties": {
            "text_field": {
               "type": "string",
               "index_analyzer": "edge_ngram_analyzer",
               "search_analyzer": "standard"
            }
         }
      }
   }
}

然后将文档索引为:

PUT /test_index/item/1
{
   "text_field": [
      "cat dog",
      "elephant"
   ]
}

并且任何这些查询都会返回它:

POST /test_index/_search
{
    "query": {
        "match": {
           "text_field": "dog"
        }
    }
}

POST /test_index/_search
{
    "query": {
        "match": {
           "text_field": "ele"
        }
    }
}

POST /test_index/_search
{
    "query": {
        "match": {
           "text_field": "ca"
        }
    }
}

以下是所有代码:

http://sense.qbox.io/gist/4a08fbb6e42c34ff8904badfaaeecc01139f96cf