我使用的是Elasticsearch 6.6,并具有一个索引(1个碎片,1个副本),其中索引了地理名称(https://www.geonames.org/)数据集(indexsize = 1.3 gb,11.8 mio geopoints)。 我在玩地理距离排序查询,将整个索引排序为一些原点。因此,经过一些测试,我发现升序排序总是快于降序排序。这是一个示例查询(我也用更大的“大小”参数进行了测试):
POST /geonames/_search?request_cache=false
{
"size":1,
"sort" : [
{
"_geo_distance" : {
"location" : [8, 49],
"order" : "asc",
"unit" : "m",
"mode" : "min",
"distance_type" : "arc",
"ignore_unmapped": true
}
}
]
}
以下是升序排序的答案(解释和配置文件为True):
{
"took" : 1374,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 11858060,
"max_score" : null,
"hits" : [
{
"_shard" : "[geonames][0]",
"_node" : "qXTymyB9QLmxhPtGEtA_mA",
"_index" : "geonames",
"_type" : "doc",
"_id" : "L781LmkBrQo0YN4qP48D",
"_score" : null,
"_source" : {
"id" : "3034701",
"name" : "Forêt de Wissembourg",
"location" : {
"lat" : "49.00924",
"lon" : "8.01542"
}
},
"sort" : [
1523.4121312414704
],
"_explanation" : {
"value" : 1.0,
"description" : "*:*",
"details" : [ ]
}
}
]
},
"profile" : {
"shards" : [
{
"id" : "[qXTymyB9QLmxhPtGEtA_mA][geonames][0]",
"searches" : [
{
"query" : [
{
"type" : "MatchAllDocsQuery",
"description" : "*:*",
"time_in_nanos" : 265223567,
"breakdown" : {
"score" : 0,
"build_scorer_count" : 54,
"match_count" : 0,
"create_weight" : 10209,
"next_doc" : 253091268,
"match" : 0,
"create_weight_count" : 1,
"next_doc_count" : 11858087,
"score_count" : 0,
"build_scorer" : 263948,
"advance" : 0,
"advance_count" : 0
}
}
],
"rewrite_time" : 1097,
"collector" : [
{
"name" : "CancellableCollector",
"reason" : "search_cancelled",
"time_in_nanos" : 1044167746,
"children" : [
{
"name" : "SimpleFieldCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 508296683
}
]
}
]
}
],
"aggregations" : [ ]
}
]
}
}
这里是降序,只需将参数从asc切换到desc(也带有profile和explain):
{
"took" : 2226,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 11858060,
"max_score" : null,
"hits" : [
{
"_shard" : "[geonames][0]",
"_node" : "qXTymyB9QLmxhPtGEtA_mA",
"_index" : "geonames",
"_type" : "doc",
"_id" : "Mq80LmkBrQo0YN4q11bA",
"_score" : null,
"_source" : {
"id" : "4036351",
"name" : "Bollons Seamount",
"location" : {
"lat" : "-49.66667",
"lon" : "-176.16667"
}
},
"sort" : [
1.970427111052182E7
],
"_explanation" : {
"value" : 1.0,
"description" : "*:*",
"details" : [ ]
}
}
]
},
"profile" : {
"shards" : [
{
"id" : "[qXTymyB9QLmxhPtGEtA_mA][geonames][0]",
"searches" : [
{
"query" : [
{
"type" : "MatchAllDocsQuery",
"description" : "*:*",
"time_in_nanos" : 268521404,
"breakdown" : {
"score" : 0,
"build_scorer_count" : 54,
"match_count" : 0,
"create_weight" : 9333,
"next_doc" : 256458664,
"match" : 0,
"create_weight_count" : 1,
"next_doc_count" : 11858087,
"score_count" : 0,
"build_scorer" : 195265,
"advance" : 0,
"advance_count" : 0
}
}
],
"rewrite_time" : 1142,
"collector" : [
{
"name" : "CancellableCollector",
"reason" : "search_cancelled",
"time_in_nanos" : 1898324618,
"children" : [
{
"name" : "SimpleFieldCollector",
"reason" : "search_top_hits",
"time_in_nanos" : 1368306442
}
]
}
]
}
],
"aggregations" : [ ]
}
]
}
}
所以我的问题是,为什么会这样?据我了解,Es计算从原点到其他每个点的距离,然后对它们进行排序。那么,为什么降序排序这么慢?
答案 0 :(得分:0)
在Elasticsearch板上询问相同的问题,并得到一个answer。 因此,显然Elasticsearch使用不同的搜索策略/算法来进行末端降序排序。
对于降序排序,它计算从原点到每个点末端的距离,然后进行排序。 对于升序排序,它使用边界框过滤原点附近的点,并且仅计算边界框内点的距离。