我正在尝试在弹性搜索中实现完全匹配搜索。但是我没有得到所需的结果。 这是解释我所面临的问题和尝试过的事情的代码。
doc1 = {"sentence": "Today is a sunny day."}
doc2 = {"sentence": " Today is a sunny day but tomorrow it might rain"}
doc3 = {"sentence": "I know I am awesome"}
doc4 = {"sentence": "The taste of your dish is awesome"}
doc5 = {"sentence": "The taste of banana shake is good"}
# Indexing the above docs
es.index(index="english",doc_type="sentences",id=1,body=doc1)
es.index(index="english",doc_type="sentences",id=2,body=doc2)
es.index(index="english",doc_type="sentences",id=3,body=doc3)
es.index(index="english",doc_type="sentences",id=4,body=doc4)
es.index(index="english",doc_type="sentences",id=5,body=doc5)
查询1
res = es.search(index="english",body={"from":0,"size":5,
"query":
{"match_phrase":
{"sentence":{"query":"Today is a sunny day"}
}},
"explain":False})
查询2
res = es.search(index="english",body={"from":0,"size":5,
"query":{
"bool":{
"must":{
"match_phrase":
{"sentence":{"query":"Today is a sunny day"}
}},
"filter":{
"term":{
"sentence.word_count": 5}},
}
}
})
因此,当我运行查询1时,我得到doc2作为最高结果,而我希望doc1成为最高结果。
当我尝试使用过滤器进行相同操作(将搜索的长度限制为查询的长度)时,如查询2所示,没有任何结果。
如果能解决任何问题,我将不胜感激。我想要给定查询的完全匹配,而不是包含该查询的匹配。
谢谢
答案 0 :(得分:1)
我的胆量告诉我,您的索引有5个主要分片,并且您没有足够的文档来确保得分不相关。如果使用单个主碎片创建索引,则第一个查询将返回您期望的文档。您可以在以下文章中详细了解发生这种情况的原因:https://www.elastic.co/blog/practical-bm25-part-1-how-shards-affect-relevance-scoring-in-elasticsearch
一种实现所需功能的方法是使用keyword
类型,但使用normalizer
来小写数据,以便以不区分大小写的方式搜索精确匹配。
像这样创建索引:
PUT english
{
"settings": {
"analysis": {
"normalizer": {
"lc_normalizer": {
"type": "custom",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"sentences": {
"properties": {
"sentence": {
"type": "text",
"fields": {
"exact": {
"type": "keyword",
"normalizer": "lc_normalizer"
}
}
}
}
}
}
}
然后您可以照常索引文档。
PUT english/sentences/1
{"sentence": "Today is a sunny day"}
PUT english/sentences/2
{"sentence": "Today is a sunny day but tomorrow it might rain"}
...
最后,您可以搜索完全匹配的词组,下面的查询将仅返回doc1
POST english/_search
{
"query": {
"match": {
"sentence.exact": "today is a sunny day"
}
}
}
答案 1 :(得分:0)
此查询将有效-
{
"query":{
"match_phrase":{
"sentence":{
"query":"Today is a sunny day"
}
}
},
"size":5,
"from":0,
"explain":false
}
答案 2 :(得分:0)
尝试使用布尔查询
PUT test_index/doc/1
{"sentence": "Today is a sunny day"}
PUT test_index/doc/2
{"sentence": "Today is a sunny day but tomorrow it might rain"}
-#terms query for exact match with keyword and multi match - phrase for other matches
GET test_index/_search
{
"query": {
"bool": {
"should": [
{
"terms": {
"sentence.keyword": [
"Today is a sunny day"
]
}
},
{
"multi_match":{
"query":"Today is a sunny day",
"type":"phrase",
"fields":[
"sentence"
]
}
}
]
}
}
}
另一种选择是使用多重匹配,首先将关键字匹配,将匹配提高5,将其他匹配不提高:
PUT test_index/doc/1
{"sentence": "Today is a sunny day"}
PUT test_index/doc/2
{"sentence": "Today is a sunny day but tomorrow it might rain"}
GET test_index/_search
{
"query":{
"bool":{
"should":[
{
"multi_match":{
"query":"Today is a sunny day",
"type":"phrase",
"fields":[
"sentence.keyword"
],
"boost":5
}
},
{
"multi_match":{
"query":"Today is a sunny day",
"type":"phrase",
"fields":[
"sentence"
]
}
}
]
}
}
}