为什么升序地理距离排序比降序地理距离排序更快

时间:2019-03-21 14:23:55

标签: sorting elasticsearch distance

我使用的是Elasticsearch 6.6,并具有一个索引(1个碎片,1个副本),其中索引了地理名称(https://www.geonames.org/)数据集(indexsize = 1.3 gb,11.8 mio geopoints)。 我在玩地理距离排序查询,将整个索引排序为一些原点。因此,经过一些测试,我发现升序排序总是快于降序排序。这是一个示例查询(我也用更大的“大小”参数进行了测试):

POST /geonames/_search?request_cache=false
{   
    "size":1,
    "sort" : [
        {
            "_geo_distance" : {
                "location" : [8, 49],
                "order" : "asc",
                "unit" : "m",
                "mode" : "min",
                "distance_type" : "arc",
                "ignore_unmapped": true
            }
        }
    ]
}

以下是升序排序的答案(解释和配置文件为True):

{
  "took" : 1374,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 11858060,
    "max_score" : null,
    "hits" : [
      {
        "_shard" : "[geonames][0]",
        "_node" : "qXTymyB9QLmxhPtGEtA_mA",
        "_index" : "geonames",
        "_type" : "doc",
        "_id" : "L781LmkBrQo0YN4qP48D",
        "_score" : null,
        "_source" : {
          "id" : "3034701",
          "name" : "Forêt de Wissembourg",
          "location" : {
            "lat" : "49.00924",
            "lon" : "8.01542"
          }
        },
        "sort" : [
          1523.4121312414704
        ],
        "_explanation" : {
          "value" : 1.0,
          "description" : "*:*",
          "details" : [ ]
        }
      }
    ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[qXTymyB9QLmxhPtGEtA_mA][geonames][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "MatchAllDocsQuery",
                "description" : "*:*",
                "time_in_nanos" : 265223567,
                "breakdown" : {
                  "score" : 0,
                  "build_scorer_count" : 54,
                  "match_count" : 0,
                  "create_weight" : 10209,
                  "next_doc" : 253091268,
                  "match" : 0,
                  "create_weight_count" : 1,
                  "next_doc_count" : 11858087,
                  "score_count" : 0,
                  "build_scorer" : 263948,
                  "advance" : 0,
                  "advance_count" : 0
                }
              }
            ],
            "rewrite_time" : 1097,
            "collector" : [
              {
                "name" : "CancellableCollector",
                "reason" : "search_cancelled",
                "time_in_nanos" : 1044167746,
                "children" : [
                  {
                    "name" : "SimpleFieldCollector",
                    "reason" : "search_top_hits",
                    "time_in_nanos" : 508296683
                  }
                ]
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}

这里是降序,只需将参数从asc切换到desc(也带有profile和explain):

{
  "took" : 2226,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 11858060,
    "max_score" : null,
    "hits" : [
      {
        "_shard" : "[geonames][0]",
        "_node" : "qXTymyB9QLmxhPtGEtA_mA",
        "_index" : "geonames",
        "_type" : "doc",
        "_id" : "Mq80LmkBrQo0YN4q11bA",
        "_score" : null,
        "_source" : {
          "id" : "4036351",
          "name" : "Bollons Seamount",
          "location" : {
            "lat" : "-49.66667",
            "lon" : "-176.16667"
          }
        },
        "sort" : [
          1.970427111052182E7
        ],
        "_explanation" : {
          "value" : 1.0,
          "description" : "*:*",
          "details" : [ ]
        }
      }
    ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[qXTymyB9QLmxhPtGEtA_mA][geonames][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "MatchAllDocsQuery",
                "description" : "*:*",
                "time_in_nanos" : 268521404,
                "breakdown" : {
                  "score" : 0,
                  "build_scorer_count" : 54,
                  "match_count" : 0,
                  "create_weight" : 9333,
                  "next_doc" : 256458664,
                  "match" : 0,
                  "create_weight_count" : 1,
                  "next_doc_count" : 11858087,
                  "score_count" : 0,
                  "build_scorer" : 195265,
                  "advance" : 0,
                  "advance_count" : 0
                }
              }
            ],
            "rewrite_time" : 1142,
            "collector" : [
              {
                "name" : "CancellableCollector",
                "reason" : "search_cancelled",
                "time_in_nanos" : 1898324618,
                "children" : [
                  {
                    "name" : "SimpleFieldCollector",
                    "reason" : "search_top_hits",
                    "time_in_nanos" : 1368306442
                  }
                ]
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}

所以我的问题是,为什么会这样?据我了解,Es计算从原点到其他每个点的距离,然后对它们进行排序。那么,为什么降序排序这么慢?

1 个答案:

答案 0 :(得分:0)

在Elasticsearch板上询问相同的问题,并得到一个answer。 因此,显然Elasticsearch使用不同的搜索策略/算法来进行末端降序排序。

对于降序排序,它计算从原点到每个点末端的距离,然后进行排序。 对于升序排序,它使用边界框过滤原点附近的点,并且仅计算边界框内点的距离。