如何在弹性搜索中进行精确的短语匹配?

时间:2018-09-06 14:57:59

标签: elasticsearch match-phrase

我正在尝试在弹性搜索中实现完全匹配搜索。但是我没有得到所需的结果。 这是解释我所面临的问题和尝试过的事情的代码。

doc1 = {"sentence": "Today is a sunny day."}
doc2 = {"sentence": " Today is a sunny day but tomorrow it might rain"}
doc3 = {"sentence": "I know I am awesome"}
doc4 = {"sentence": "The taste of your dish is awesome"}
doc5 = {"sentence": "The taste of banana shake is good"}

# Indexing the above docs

es.index(index="english",doc_type="sentences",id=1,body=doc1)

es.index(index="english",doc_type="sentences",id=2,body=doc2)

es.index(index="english",doc_type="sentences",id=3,body=doc3)

es.index(index="english",doc_type="sentences",id=4,body=doc4)

es.index(index="english",doc_type="sentences",id=5,body=doc5)

查询1

res = es.search(index="english",body={"from":0,"size":5,
                                  "query":
                                      {"match_phrase":
                                          {"sentence":{"query":"Today is a sunny day"}
                                          }},

                                          "explain":False})

查询2

 res = es.search(index="english",body={"from":0,"size":5,
                                  "query":{
                                    "bool":{
                                            "must":{
                                            "match_phrase":
                                          {"sentence":{"query":"Today is a sunny day"}
                                          }},
                                            "filter":{
                                                    "term":{
                                                            "sentence.word_count": 5}},

                                          }
                                            }
                                            })

因此,当我运行查询1时,我得到doc2作为最高结果,而我希望doc1成为最高结果。

当我尝试使用过滤器进行相同操作(将搜索的长度限制为查询的长度)时,如查询2所示,没有任何结果。

如果能解决任何问题,我将不胜感激。我想要给定查询的完全匹配,而不是包含该查询的匹配。

谢谢

3 个答案:

答案 0 :(得分:1)

我的胆量告诉我,您的索引有5个主要分片,并且您没有足够的文档来确保得分不相关。如果使用单个主碎片创建索引,则第一个查询将返回您期望的文档。您可以在以下文章中详细了解发生这种情况的原因:https://www.elastic.co/blog/practical-bm25-part-1-how-shards-affect-relevance-scoring-in-elasticsearch

一种实现所需功能的方法是使用keyword类型,但使用normalizer来小写数据,以便以不区分大小写的方式搜索精确匹配。

像这样创建索引:

PUT english
{
  "settings": {
    "analysis": {
      "normalizer": {
        "lc_normalizer": {
          "type": "custom",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "sentences": {
      "properties": {
        "sentence": {
          "type": "text",
          "fields": {
            "exact": {
              "type": "keyword",
              "normalizer": "lc_normalizer"
            }
          }
        }
      }
    }
  }
}

然后您可以照常索引文档。

PUT english/sentences/1
{"sentence": "Today is a sunny day"}
PUT english/sentences/2
{"sentence": "Today is a sunny day but tomorrow it might rain"}
...

最后,您可以搜索完全匹配的词组,下面的查询将仅返回doc1

POST english/_search
{
  "query": {
    "match": {
      "sentence.exact": "today is a sunny day"
    }
  }
}

答案 1 :(得分:0)

此查询将有效-

{
    "query":{
        "match_phrase":{
            "sentence":{
                "query":"Today is a sunny day"
            }
        }
    },
    "size":5,
    "from":0,
    "explain":false
}

答案 2 :(得分:0)

尝试使用布尔查询

    PUT test_index/doc/1
    {"sentence": "Today is a sunny day"}

    PUT test_index/doc/2
    {"sentence": "Today is a sunny day but tomorrow it might rain"}

 -#terms query for exact match with keyword and multi match - phrase for other matches
    GET test_index/_search
    {
      "query": {
        "bool": {
          "should": [
            {
              "terms": {
                "sentence.keyword": [
                  "Today is a sunny day"
                ]
              }
            },
            {  
              "multi_match":{  
                "query":"Today is a sunny day",
                "type":"phrase",
                "fields":[  
                    "sentence"
                ]
              }
            }
          ]
        }
      }
    }

另一种选择是使用多重匹配,首先将关键字匹配,将匹配提高5,将其他匹配不提高:

PUT test_index/doc/1
{"sentence": "Today is a sunny day"}

PUT test_index/doc/2
{"sentence": "Today is a sunny day but tomorrow it might rain"}


GET test_index/_search
{  
  "query":{  
    "bool":{  
      "should":[  
        {  
          "multi_match":{  
            "query":"Today is a sunny day",
            "type":"phrase",
            "fields":[  
              "sentence.keyword"
            ],
            "boost":5
          }
        },
        {  
          "multi_match":{  
            "query":"Today is a sunny day",
            "type":"phrase",
            "fields":[  
                "sentence"
            ]
          }
        }
      ]
    }
  }
}