Question

我使用的是Elasticsearch 6.6，并具有一个索引（1个碎片，1个副本），其中索引了地理名称（https://www.geonames.org/）数据集（indexsize = 1.3 gb，11.8 mio geopoints）。我在玩地理距离排序查询，将整个索引排序为一些原点。因此，经过一些测试，我发现升序排序总是快于降序排序。这是一个示例查询（我也用更大的“大小”参数进行了测试）：

POST /geonames/_search?request_cache=false
{   
    "size":1,
    "sort" : [
        {
            "_geo_distance" : {
                "location" : [8, 49],
                "order" : "asc",
                "unit" : "m",
                "mode" : "min",
                "distance_type" : "arc",
                "ignore_unmapped": true
            }
        }
    ]
}

以下是升序排序的答案（解释和配置文件为True）：

{
  "took" : 1374,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 11858060,
    "max_score" : null,
    "hits" : [
      {
        "_shard" : "[geonames][0]",
        "_node" : "qXTymyB9QLmxhPtGEtA_mA",
        "_index" : "geonames",
        "_type" : "doc",
        "_id" : "L781LmkBrQo0YN4qP48D",
        "_score" : null,
        "_source" : {
          "id" : "3034701",
          "name" : "Forêt de Wissembourg",
          "location" : {
            "lat" : "49.00924",
            "lon" : "8.01542"
          }
        },
        "sort" : [
          1523.4121312414704
        ],
        "_explanation" : {
          "value" : 1.0,
          "description" : "*:*",
          "details" : [ ]
        }
      }
    ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[qXTymyB9QLmxhPtGEtA_mA][geonames][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "MatchAllDocsQuery",
                "description" : "*:*",
                "time_in_nanos" : 265223567,
                "breakdown" : {
                  "score" : 0,
                  "build_scorer_count" : 54,
                  "match_count" : 0,
                  "create_weight" : 10209,
                  "next_doc" : 253091268,
                  "match" : 0,
                  "create_weight_count" : 1,
                  "next_doc_count" : 11858087,
                  "score_count" : 0,
                  "build_scorer" : 263948,
                  "advance" : 0,
                  "advance_count" : 0
                }
              }
            ],
            "rewrite_time" : 1097,
            "collector" : [
              {
                "name" : "CancellableCollector",
                "reason" : "search_cancelled",
                "time_in_nanos" : 1044167746,
                "children" : [
                  {
                    "name" : "SimpleFieldCollector",
                    "reason" : "search_top_hits",
                    "time_in_nanos" : 508296683
                  }
                ]
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}

这里是降序，只需将参数从asc切换到desc（也带有profile和explain）：

{
  "took" : 2226,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 11858060,
    "max_score" : null,
    "hits" : [
      {
        "_shard" : "[geonames][0]",
        "_node" : "qXTymyB9QLmxhPtGEtA_mA",
        "_index" : "geonames",
        "_type" : "doc",
        "_id" : "Mq80LmkBrQo0YN4q11bA",
        "_score" : null,
        "_source" : {
          "id" : "4036351",
          "name" : "Bollons Seamount",
          "location" : {
            "lat" : "-49.66667",
            "lon" : "-176.16667"
          }
        },
        "sort" : [
          1.970427111052182E7
        ],
        "_explanation" : {
          "value" : 1.0,
          "description" : "*:*",
          "details" : [ ]
        }
      }
    ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[qXTymyB9QLmxhPtGEtA_mA][geonames][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "MatchAllDocsQuery",
                "description" : "*:*",
                "time_in_nanos" : 268521404,
                "breakdown" : {
                  "score" : 0,
                  "build_scorer_count" : 54,
                  "match_count" : 0,
                  "create_weight" : 9333,
                  "next_doc" : 256458664,
                  "match" : 0,
                  "create_weight_count" : 1,
                  "next_doc_count" : 11858087,
                  "score_count" : 0,
                  "build_scorer" : 195265,
                  "advance" : 0,
                  "advance_count" : 0
                }
              }
            ],
            "rewrite_time" : 1142,
            "collector" : [
              {
                "name" : "CancellableCollector",
                "reason" : "search_cancelled",
                "time_in_nanos" : 1898324618,
                "children" : [
                  {
                    "name" : "SimpleFieldCollector",
                    "reason" : "search_top_hits",
                    "time_in_nanos" : 1368306442
                  }
                ]
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}

所以我的问题是，为什么会这样？据我了解，Es计算从原点到其他每个点的距离，然后对它们进行排序。那么，为什么降序排序这么慢？

Answer 1

在Elasticsearch板上询问相同的问题，并得到一个answer。因此，显然Elasticsearch使用不同的搜索策略/算法来进行末端降序排序。

对于降序排序，它计算从原点到每个点末端的距离，然后进行排序。对于升序排序，它使用边界框过滤原点附近的点，并且仅计算边界框内点的距离。

为什么升序地理距离排序比降序地理距离排序更快

1 个答案: