提高复杂PostgreSQL查询的性能

时间:2015-07-17 17:49:23

标签: postgresql

我试图找出如何缩短此查询的时间。有人告诉我EXPLAIN ANALYZE,但我不知道如何解释结果以及要做出哪些修正。有什么建议?请注意,我使用的是第三方数据库(cartoDB),所以我不认为我可以选择创建索引。

这是查询。这里的两个表大约有40行,大约有32,000行。

EXPLAIN ANALYZE SELECT
  id, identifier,
  CASE
    WHEN dist <  8046. THEN 1
    WHEN dist <  16093. THEN 2
    WHEN dist < 40233. THEN 3
WHEN dist < 80467. THEN 4
WHEN dist < 160934. THEN 5
    ELSE 6
  END AS grp,
  count(*)
FROM (
    SELECT s.id, s.identifier, ST_Distance_Sphere(s.the_geom, c.the_geom) AS dist
    FROM full_data_for_testing_deid_2 c, demo_locations_table s) AS loc_dist
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3

以下是对EXECUTE ANALYZE

的回复
{
  "fields" : {
    "QUERY PLAN" : {
      "type" : "string"
    }
  },
  "rows" : [
    {
      "QUERY PLAN" : "GroupAggregate  (cost=373146.40..651612.12 rows=1058805 width=128) (actual time=34120.054..37536.893 rows=197 loops=1)"
    },
    {
      "QUERY PLAN" : "  ->  Sort  (cost=373146.40..373675.81 rows=1058805 width=128) (actual time=34120.000..36504.439 rows=1058805 loops=1)"
    },
    {
      "QUERY PLAN" : "        Sort Key: s.id, s.identifier, (CASE WHEN (_st_distance(geography(s.the_geom), geography(c.the_geom), 0::double precision, false) < 8046::double precision) THEN 1 WHEN (_st_distance(geography(s.the_geom), geography(c.the_geom), 0::double precision, false) < 16093::double precision) THEN 2 WHEN (_st_distance(geography(s.the_geom), geography(c.the_geom), 0::double precision, false) < 40233::double precision) THEN 3 WHEN (_st_distance(geography(s.the_geom), geography(c.the_geom), 0::double precision, false) < 80467::double precision) THEN 4 WHEN (_st_distance(geography(s.the_geom), geography(c.the_geom), 0::double precision, false) < 160934::double precision) THEN 5 ELSE 6 END)"
    },
    {
      "QUERY PLAN" : "        Sort Method: external merge  Disk: 35200kB"
    },
    {
      "QUERY PLAN" : "        ->  Nested Loop  (cost=0.00..283194.48 rows=1058805 width=128) (actual time=0.688..13487.097 rows=1058805 loops=1)"
    },
    {
      "QUERY PLAN" : "              ->  Seq Scan on full_data_for_testing_deid_2 c  (cost=0.00..6845.26 rows=32085 width=32) (actual time=0.006..130.054 rows=32085 loops=1)"
    },
    {
      "QUERY PLAN" : "              ->  Materialize  (cost=0.00..1.13 rows=33 width=96) (actual time=0.001..0.028 rows=33 loops=32085)"
    },
    {
      "QUERY PLAN" : "                    ->  Seq Scan on demo_locations_table s  (cost=0.00..1.10 rows=33 width=96) (actual time=0.003..0.034 rows=33 loops=1)"
    },
    {
      "QUERY PLAN" : "Total runtime: 37569.205 ms"
    }
  ],
  "time" : 37.574,
  "total_rows" : 9
}

1 个答案:

答案 0 :(得分:0)

问题出现在笛卡尔积中:     SELECT s.id,s.identifier,ST_Distance_Sphere(s.the_geom,c.the_geom)AS dist     FROM full_data_for_testing_deid_2 c,demo_locations_table s

以下是嵌套循环。 我不认为你想在这里做笛卡儿。 你可以通过更具体的JOIN ON轻松切断一些不必要的循环。 两点之间的距离是可交换函数。 所以只需添加以下条件:c.pk&gt; s.pk取决于您的需求(没有关于架构设计的信息)