我已经阅读了很多内容,似乎使用EdgeNGrams是为搜索应用程序实现自动完成功能的好方法。我已经在我的设置中为我的索引配置了EdgeNGrams。
PUT /bigtestindex
{
"settings":{
"analysis":{
"analyzer":{
"autocomplete":{
"type":"custom",
"tokenizer":"standard",
"filter":[ "standard", "stop", "kstem", "ngram" ]
}
},
"filter":{
"edgengram":{
"type":"ngram",
"min_gram":2,
"max_gram":15
}
},
"highlight": {
"pre_tags" : ["<em>"],
"post_tags" : ["</em>"],
"fields": {
"title.autocomplete": {
"number_of_fragments": 1,
"fragment_size": 250
}
}
}
}
}
}
因此,如果在我的设置中配置了EdgeNGram过滤器,我该如何将其添加到搜索查询中?
到目前为止我所拥有的是一个带有高亮显示的匹配查询:
GET /bigtestindex/doc/_search
{
"query": {
"match": {
"content": {
"query": "thing and another thing",
"operator": "and"
}
}
},
"highlight": {
"pre_tags" : ["<em>"],
"post_tags" : ["</em>"],
"field": {
"_source.content": {
"number_of_fragments": 1,
"fragment_size": 250
}
}
}
}
如何使用在索引设置中配置的EdgeNGrams为搜索查询添加自动填充功能?
更新 对于映射,做这样的事情是否理想:
"title": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
},
或者我需要使用multi_field类型:
"title": {
"type": "multi_field",
"fields": {
"title": {
"type": "string"
},
"autocomplete": {
"analyzer": "autocomplete",
"type": "string",
"index": "not_analyzed"
}
}
},
我正在使用ES 1.4.1并希望将标题字段用于自动完成目的....?
答案 0 :(得分:1)
简短回答:您需要在字段映射中使用它。如:
PUT /test_index
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"stop",
"kstem",
"ngram"
]
}
},
"filter": {
"edgengram": {
"type": "ngram",
"min_gram": 2,
"max_gram": 15
}
}
}
},
"mappings": {
"doc": {
"properties": {
"field1": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
}
有关更多讨论,请参阅:
http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
和
http://blog.qbox.io/an-introduction-to-ngrams-in-elasticsearch
另外,我不认为你想要索引定义中的"highlight"
部分;属于查询。
编辑:在尝试使用代码时,会遇到一些问题。一个是我已经提到的重点问题。另一个是您为过滤器"edgengram"
命名,即使它是"ngram"
类型而不是"edgeNGram"
类型,但您在分析器中引用了过滤器"ngram"
,使用default ngram filter,这可能不会给你你想要的东西。 (提示:您可以使用term vectors来确定您的分析器对您的文档做了什么;但您可能希望在生产中关闭它们。)
所以你真正想要的可能是这样的:
PUT /test_index
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"stop",
"kstem",
"edgengram_filter"
]
}
},
"filter": {
"edgengram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
},
"mappings": {
"doc": {
"properties": {
"content": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
}
当我将这两个文档编入索引时:
POST test_index/doc/_bulk
{"index":{"_id":1}}
{"content":"hello world"}
{"index":{"_id":2}}
{"content":"goodbye world"}
并运行此查询(您的"highlight"
区块也出现错误;应该说"fields"
而不是"field"
)&#34;
POST /test_index/doc/_search
{
"query": {
"match": {
"content": {
"query": "good wor",
"operator": "and"
}
}
},
"highlight": {
"pre_tags": [
"<em>"
],
"post_tags": [
"</em>"
],
"fields": {
"content": {
"number_of_fragments": 1,
"fragment_size": 250
}
}
}
}
如果我理解正确的话,我会回复此回复,这似乎是您正在寻找的回应:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2712221,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 0.2712221,
"_source": {
"content": "goodbye world"
},
"highlight": {
"content": [
"<em>goodbye</em> <em>world</em>"
]
}
}
]
}
}
以下是我用来测试它的一些代码:
http://sense.qbox.io/gist/3092992993e0328f7c4ee80e768dd508a0bc053f