Elasticsearch通过模糊搜索和edge_ngram高亮显示每个字符

时间:2019-03-18 15:24:44

标签: php elasticsearch autocomplete elastic-stack

我正在尝试将模糊搜索与突出显示和edge_ngram结合使用,以获取“按需搜索”功能。我已经做好了所有工作,尽管有一个问题:即使我将min_gram指定为1,我收到的高亮显示也要额外增加2-3个字符而不是每个字符。在某些情况下,

测试

+--------+---------------------------------------+---------------------------------------+
| Input  | Expected output                       | Actual output                         |
+--------+---------------------------------------+---------------------------------------+
| engin  | <em>Engin</em>eer                     | <em>Engine</em>er                     |
+--------+---------------------------------------+---------------------------------------+
| tell   | <em>Tell</em>er                       | <em>Telle</em>r                       |
+--------+---------------------------------------+---------------------------------------+
| engibe | <em>Engine</em>er                     | <em>Enginee</em>r                     |
+--------+---------------------------------------+---------------------------------------+
| pakk   | <em>Pack</em>er and <em>Pack</em>ager | <em>Pack</em>er and <em>Pack</em>ager |
+--------+---------------------------------------+---------------------------------------+

我的查询如下:

{
   "query":{
      "bool":{
         "should":[
            {
               "match":{
                  "title.autocomplete":{
                     "query":"engin"
                  }
               }
            },
            {
               "match":{
                  "title.autocomplete":{
                     "query":"engin",
                     "fuzziness":"AUTO"
                  }
               }
            }
         ]
      }
   }
}

当我仅使用match子句而没有模糊性时,我将收到正确的突出显示。

我的突出显示配置:

{
   "highlight":{
         "fields":{
            "title.autocomplete":{
               "pre_tags":"<em>",
               "post_tags":"<em>",
               "fragmenter":"simple",
               "type":"plain"
            }
        }
    }
}

我的edge_ngram配置:

"settings": {
    "analysis": {
        "analyzer": {
            "custom_analyzer": {
                "type": "custom",
                "filter": {
                    "lowercase"
                },
                "tokenizer": "whitespace"
            },
            "autocomplete": {
                "tokenizer": "autocomplete",
                "filter": {
                    "lowercase"
                }
            },
            "autocomplete_search": {
                "tokenizer": "lowercase"
            }
        },
        "tokenizer": {
            "autocomplete": {
                "type": "edge_ngram",
                "min_gram": 1,
                "max_gram": 50,
                "token_chars": {
                    "letter", "digit"
                }
            }
        }
    }
}

0 个答案:

没有答案