Elasticsearch查询使用python产生的答案比预期的要多

时间:2018-07-30 19:16:35

标签: python elasticsearch

我有这段python代码,在其中为Elasticsearch创建了映射,然后使用下面提到的搜索查询来搜索内容:

映射:

data_mapping = {

        "settings": {
            "analysis": {
                "analyzer": {
                    "es_analyzer": {
                        "tokenizer": "standard",
                        "filter": [

                            "stop_words"

                        ]
                    }
                },
                "filter": {

                    "stop_words": {
                        "type": "standard",
                        "stopwords": "_english_"
                    }
                }
            }
        },
        "mappings": {
            str(bot_name).lower(): {
                "properties": {
                    "qid": {
                        "type": "string",
                        "fields": {
                            "stemmed": {
                                "type": "string"

                            }
                        }
                    },
                    "q": {
                        "type": "array",
                        "fields": {
                            "stemmed": {
                                "type": "string"

                            }
                        }
                    },
                    "a": {
                        "type": "string",
                        "fields": {
                            "stemmed": {
                                "type": "string"

                            }
                        }
                    },
                    "votes": {
                        "type": "integer",
                        "fields": {
                            "stemmed": {
                                "type": "integer"

                            }
                        }
                    }

                }
            }
        }
    }

来自上述映射的样本数据为:

{"qid":"1","q":["what can you tell me about Google Flag","I want to know about Google Flag","tell me about Google Flag","What is Google Flag"],"a":"Google is a search engine company based out of California USA.","votes":0}

{"qid":"2","q":["How is the Google Flag used"],"a":"Google flag is used search indexing.","votes":0}

{"qid":"3","q":["How is the Google Flag maintained"],"a":"Google means to search.","votes":0}

查询:

data = {
            "query": {
                "function_score": {

                    "query": {

                        "multi_match": {
                            "type": "most_fields",
                            "query": question,
                            "fields": ["q", "English"]

                        }
                    },

                    "field_value_factor": {
                        "field": "votes",
                        "modifier": "log2p"
                    }

                }
            }
        }
        response = es.search(index=str(index_name).lower(), body=data)

在上面的查询中,我正在做的是针对映射内容中的q字段搜索一个问题。现在,当我搜索What is google flag时,理想情况下q qid的{​​{1}}字段应该是最高的,但是1 qid的得分最高。但是,当我搜索3(加上What is google flag?)时,? qid的得分最高。我无法理解:

  1. 为什么1 qid最初得分最高-我的猜测是TF / IDF压倒了别人。

  2. 为什么添加3会使? qid的得分最高?

  3. 对于上述第1点(搜索“什么是google flag”),我可以对映射/搜索查询进行哪些更改,使其得分最高?如何强制Elasticsearch值100%匹配更多(如果存在一对一匹配)。

0 个答案:

没有答案