我正在尝试将模糊搜索与突出显示和edge_ngram结合使用,以获取“按需搜索”功能。我已经做好了所有工作,尽管有一个问题:即使我将min_gram
指定为1,我收到的高亮显示也要额外增加2-3个字符而不是每个字符。在某些情况下,
测试:
+--------+---------------------------------------+---------------------------------------+
| Input | Expected output | Actual output |
+--------+---------------------------------------+---------------------------------------+
| engin | <em>Engin</em>eer | <em>Engine</em>er |
+--------+---------------------------------------+---------------------------------------+
| tell | <em>Tell</em>er | <em>Telle</em>r |
+--------+---------------------------------------+---------------------------------------+
| engibe | <em>Engine</em>er | <em>Enginee</em>r |
+--------+---------------------------------------+---------------------------------------+
| pakk | <em>Pack</em>er and <em>Pack</em>ager | <em>Pack</em>er and <em>Pack</em>ager |
+--------+---------------------------------------+---------------------------------------+
我的查询如下:
{
"query":{
"bool":{
"should":[
{
"match":{
"title.autocomplete":{
"query":"engin"
}
}
},
{
"match":{
"title.autocomplete":{
"query":"engin",
"fuzziness":"AUTO"
}
}
}
]
}
}
}
当我仅使用match
子句而没有模糊性时,我将收到正确的突出显示。
我的突出显示配置:
{
"highlight":{
"fields":{
"title.autocomplete":{
"pre_tags":"<em>",
"post_tags":"<em>",
"fragmenter":"simple",
"type":"plain"
}
}
}
}
我的edge_ngram配置:
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"filter": {
"lowercase"
},
"tokenizer": "whitespace"
},
"autocomplete": {
"tokenizer": "autocomplete",
"filter": {
"lowercase"
}
},
"autocomplete_search": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 50,
"token_chars": {
"letter", "digit"
}
}
}
}
}