我有这段python
代码,在其中为Elasticsearch
创建了映射,然后使用下面提到的搜索查询来搜索内容:
映射:
data_mapping = {
"settings": {
"analysis": {
"analyzer": {
"es_analyzer": {
"tokenizer": "standard",
"filter": [
"stop_words"
]
}
},
"filter": {
"stop_words": {
"type": "standard",
"stopwords": "_english_"
}
}
}
},
"mappings": {
str(bot_name).lower(): {
"properties": {
"qid": {
"type": "string",
"fields": {
"stemmed": {
"type": "string"
}
}
},
"q": {
"type": "array",
"fields": {
"stemmed": {
"type": "string"
}
}
},
"a": {
"type": "string",
"fields": {
"stemmed": {
"type": "string"
}
}
},
"votes": {
"type": "integer",
"fields": {
"stemmed": {
"type": "integer"
}
}
}
}
}
}
}
来自上述映射的样本数据为:
{"qid":"1","q":["what can you tell me about Google Flag","I want to know about Google Flag","tell me about Google Flag","What is Google Flag"],"a":"Google is a search engine company based out of California USA.","votes":0}
{"qid":"2","q":["How is the Google Flag used"],"a":"Google flag is used search indexing.","votes":0}
{"qid":"3","q":["How is the Google Flag maintained"],"a":"Google means to search.","votes":0}
查询:
data = {
"query": {
"function_score": {
"query": {
"multi_match": {
"type": "most_fields",
"query": question,
"fields": ["q", "English"]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log2p"
}
}
}
}
response = es.search(index=str(index_name).lower(), body=data)
在上面的查询中,我正在做的是针对映射内容中的q
字段搜索一个问题。现在,当我搜索What is google flag
时,理想情况下q
qid
的{{1}}字段应该是最高的,但是1
qid
的得分最高。但是,当我搜索3
(加上What is google flag?
)时,?
qid
的得分最高。我无法理解:
为什么1
qid
最初得分最高-我的猜测是TF / IDF压倒了别人。
为什么添加3
会使?
qid
的得分最高?
对于上述第1点(搜索“什么是google flag”),我可以对映射/搜索查询进行哪些更改,使其得分最高?如何强制Elasticsearch值100%匹配更多(如果存在一对一匹配)。