Elasticsearch搜索布尔+必须查询

时间:2018-12-03 14:17:51

标签: elasticsearch search match

有人可以告诉我为什么这个弹性查询返回下面的结果。查询包含bool +必须的部分,该部分仅在字段nn中具有字符串“ softo”的完全匹配时才匹配。查询如下:

"query":{
        "bool":{
            "must":[
                {"match":{"nn":"softo"}}
            ],
            "should":[
                {"match":{"nn":"sro"}},
                {"match":{"nn":"as"}},
                {"match":{"nn":"no"}},
                {"match":{"nn":"vos"}},
                {"match":{"nn":"ks"}}
            ]
        }
    }

它将返回一个结果,其中nn字段中没有软结果,

            {
                "_index": "search_2",
                "_type": "doc",
                "_id": "17053188",
                "_score": 129.76167,
                "_source": {
                    "nn": "zo soz kovo zts nova as zts elektronika as",
                    "nazov": "ZO SOZ KOVO,ZŤS NOVA a.s.,ZTS ELEKTRONIKA a.s.",
                }
            },
            {
                "_index": "search_2",
                "_type": "doc",
                "_id": "45732078",
                "_score": 126.953285,
                "_source": {
                    "nn": "agentura socialnych sluzieb   ass no",
                    "nazov": "Agentúra sociálnych služieb - ASS n.o.",
                }
            }

我不明白。为什么它返回结果,例如“ zo soz kovo zts nova as zts elektronika as”,其中没有“ softo”字符串。 nn字段的映射如下:

{
    "search_2": {
        "aliases": {
            "search": {}
        },
        "mappings": {
            "doc": {
                "dynamic": "strict",
                "properties": { 
                    "nn": {
                        "type": "text",
                        "boost": 10,
                        "analyzer": "autocomplete"
                    }
                }
            }
        },
        "settings": {
            "index": {
                "refresh_interval": "-1",
                "number_of_shards": "4",
                "provided_name": "search_2",
                "creation_date": "1539693645683",
                "analysis": {
                    "filter": {
                        "synonym_filter": {
                            "ignore_case": "true",
                            "type": "synonym",
                            "synonyms_path": "synonyms/sk_SK.txt"
                        },
                        "lemmagen_filter_sk": {
                            "type": "lemmagen",
                            "lexicon": "sk"
                        },
                        "stopwords_SK": {
                            "ignore_case": "true",
                            "type": "stop",
                            "stopwords_path": "stopwords/slovak.txt"
                        },
                        "remove_duplicities": {
                            "type": "unique",
                            "only_on_same_position": "true"
                        },
                        "autocomplete_filter": {
                            "type": "edge_ngram",
                            "min_gram": "2",
                            "max_gram": "20"
                        }
                    },
                    "analyzer": {
                        "autocomplete": {
                            "filter": [
                                "stopwords_SK",
                                "lowercase",
                                "stopwords_SK",
                                "autocomplete_filter"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        },
                        "lower_ascii": {
                            "filter": [
                                "lowercase",
                                "asciifolding"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        },
                        "suggestion": {
                            "filter": [
                                "stopwords_SK",
                                "lowercase",
                                "stopwords_SK",
                                "asciifolding"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        }
                    }
                },
                "number_of_replicas": "1",
                "uuid": "eyxXza0pQxWeQCpXih8ngg",
                "version": {
                    "created": "6020399"
                }
            }
        }
    }
}

2 个答案:

答案 0 :(得分:4)

由于autocomplete字段上应用了nn分析器,所以得到这些结果的原因。 我将根据以下字段进行说明:

"nn": "zo soz kovo zts nova as zts elektronika as"

上面生成的令牌将是:

zo, so, soz, ko, kov, kovo, zt, zts, no, nov, nova, as, zt, zts, el, ele, elek, elekt, elektr, elektro, elektro, elektroni, elektronik, elektronika, as

默认情况下,现在的匹配查询将同一分析器应用于搜索,并且标记之间的默认运算符为 OR 。因此{"match":{"nn":"softo"}}实际上表现为

{
  "match": {
    "nn": "so OR sof OR soft OR softo"
  }
}

如您所见,对于字段nn,生成的令牌之一是so,因此被匹配。

答案 1 :(得分:1)

  1. 您可以在必须查询中将“ match”更改为“ term”。

    调用“ match”查询时,将计算该字段的分数。因此查询将回答问题“此字符串的匹配程度”。

    调用“ term”查询时,不会计算分数。因此查询将回答一个简单的问题:是或否(匹配或不匹配)。


  1. 如果您确实需要全文搜索,则可以在“必须”查询中保留“匹配”并提高其得分。

    例如,如果您想将其值增加5,则如下所示:

    "must":[
        {"match": {"nn": {"boost": 5, "query": "softo"}}}
    ]