Question

我正在测试以下文档中的dis_max查询：

PUT /blog/post/1
{
    "title": "Quick brown rabbits",
    "body":  "Brown rabbits are commonly seen."
}
PUT /blog/post/2
{
    "title": "Keeping pets healthy",
    "body":  "My quick brown fox eats rabbits on a regular basis."
}

此示例摘自“Elasticsearch权威指南”一书，该书解释了下面的查询中的答案将显示两个文档的等于_score。

{ 
"query": {
    "dis_max": {
        "queries": [
            { "match": { "title": "Quick pets" }},
            { "match": { "body":  "Quick pets" }}
        ]
    }

}}

但是，正如您所看到的，查询的结果显示了不同的_score。

{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.02250402,
    "hits" : [ {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "2",
      "_score" : 0.02250402,
      "_source" : {
        "title" : "Keeping pets healthy",
        "body" : "My quick brown fox eats rabbits on a regular basis."
      }
    }, {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "1",
      "_score" : 0.016645055,
      "_source" : {
        "title" : "Quick brown rabbits",
        "body" : "Brown rabbits are commonly seen."
      }
    } ]
  }
}

Elasticsearch不是从最佳匹配子句返回_score，而是以某种方式混合结果。我该如何解决？

Answer 1

我得到了答案。

发生这种令人困惑的行为是因为示例中使用的索引使用了5个分片（默认的分片数）。并且_score不是作为一个整体在索引中计算，而是在单个分片中计算，然后在用户得到答案之前进行汇总。

当您拥有大量文档时，这个问题不是问题，而不是我的情况。

所以，为了测试我的论文，我删除了我的索引：

DELETE /blog

然后，仅使用1个分片创建一个新索引：

PUT /BLOG
{ "settings" : { "number_of_shards" : 1 }}

所以，我再次执行了我的查询并获得了两个相同_score的文档：0.12713557

Sweet =）

dis_max查询不寻找最佳匹配子句

1 个答案: