Question

我们有一个弹性搜索索引，其配置如下：

PUT phonebook
{
   "settings":{
      "index":{
         "number_of_shards":8,
         "number_of_replicas":1
      }
   },
   "mappings":{
      "person":{
         "_all":{
            "enabled":false
         },
         "_source":{
            "enabled":true
         },
         "properties":{
            "id":{
               "type":"long"
            },
            "name":{
               "type":"text",
               "index_options":"positions"
            },
            "number":{
               "type":"long"
            }
         }
      }
   }
}

它基本上是一个拥有数十亿条记录的庞大电话簿。我使用以下查询搜索此索引：

GET /contacts/contact/_search
{
   "size":0,
   "query":{
      "match":{
         "name":{
            "fuzziness":1,
            "query":"george bush",
            "operator":"and"
         }
      }
   },
   "aggs":{
      "by_number":{
         "terms":{
            "field":"number",
            "size":10,
            "order":{
               "max_score":"desc"
            }
         },
         "aggs":{
            "max_score":{
               "max":{
                  "script":"_score"
               }
            },
            "sample":{
               "top_hits":{
                  "size":1
               }
            }
         }
      }
   }
}

结果按字段＆＃34;数字＆＃34;分组。以这种方式返回每个号码的最佳匹配。但我需要的是根据结果中单词顺序的正确性对结果进行自定义评分/排序。所以＆＃34;乔治布什＆＃34;应该总是得分比布什乔治＆＃34;查询乔治·布什＆＃34;。 match_phrase搜索不适合我，因为我在搜索时使用模糊性。

Answer 1

这样的事情怎么样：

  "query":{
    "simple_query_string": {
      "query": "\"barack~ obama~\"~3",
      "fields": ["name"]
    }    
  },

令牌之后的跟踪~用于模糊方面，而~3跟在短语句柄 slop 之后，这是我认为您正在寻找的概念查询。我认为结果会得到这样的结果，以及＃34; Barack Obama＆＃34;得分高于奥巴马巴拉克＆＃34;有了这个。您可以提出一个自定义bool查询来模仿这个，其中should子句处理模糊和slop方面。

一些资源：

Simple Query String
Mixing It Up - 关于slop的Elasticsearch文档

通过在elasticsearch中正确排序单词来评分结果

1 个答案: