优化复杂的Postgres SQL SELECT语句?

时间:2014-01-10 15:23:03

标签: sql postgresql query-optimization

我有以下有些复杂的SELECT语句,包含多个联接,分组依据和顺序:

SELECT 
   COUNT(*) AS count_all, 
   "response_variables"."id", 
   "response_variables"."var_name" AS "response_variables_id_response_variables_var_name" 
FROM "response_variables" 
   INNER JOIN "responses" ON "responses"."id" = "response_variables"."response_id" 
   INNER JOIN "questions" ON "questions"."id" = "responses"."question_id" 
WHERE "questions"."key" = 'rbmmpmvs' 
GROUP BY "response_variables"."id", "response_variables"."var_name" 
ORDER BY "response_variables"."var_name" ASC;

以下是运行EXPLAIN ANALYZE的输出:

 GroupAggregate  (cost=720.80..723.20 rows=120 width=9) (actual time=277.127..285.953 rows=15265 loops=1)
   ->  Sort  (cost=720.80..721.10 rows=120 width=9) (actual time=277.120..281.391 rows=15265 loops=1)
         Sort Key: response_variables.var_name, response_variables.id
         Sort Method: external merge  Disk: 288kB
         ->  Nested Loop  (cost=0.00..716.66 rows=120 width=9) (actual time=0.064..21.795 rows=15265 loops=1)
               ->  Nested Loop  (cost=0.00..657.78 rows=128 width=4) (actual time=0.050..7.919 rows=3042 loops=1)
                     ->  Index Scan using index_questions_on_key on questions  (cost=0.00..8.27 rows=1 width=4) (actual time=0.032..0.033 rows=1 loops=1)
                           Index Cond: ((key)::text = 'rbmmpmvs'::text)
                     ->  Index Scan using index_responses_on_question_id on responses  (cost=0.00..646.69 rows=282 width=8) (actual time=0.016..7.326 rows=3042 loops=1)
                           Index Cond: (question_id = questions.id)
               ->  Index Scan using index_response_variables_on_response_id on response_variables  (cost=0.00..0.42 rows=4 width=13) (actual time=0.002..0.003 rows=5 loops=3042)
                     Index Cond: (response_id = responses.id)
 Total runtime: 288.766 ms
(13 rows)

我有各种各样的零碎的索引,但不知道从哪里开始优化呼叫(或者如果可能的话)。

2 个答案:

答案 0 :(得分:0)

where子句中的条件适用于最内层连接表。这很糟糕,因为所有加入必须在之前发现问题行是否匹配。

而是,首先列出问题表 ,反转表顺序:

SELECT 
  COUNT(*) AS count_all, 
  response_variables.id, 
  response_variables.var_name AS response_variables_id_response_variables_var_name
FROM questions
JOIN responses ON questions.id = responses.question_id
JOIN response_variables ON responses.id = response_variables.response_id
WHERE questions.key = 'rbmmpmvs' 
GROUP BY response_variables.id, response_variables.var_name
ORDER BY response_variables.var_name

只要question(key)和id列上有索引,这应该会很好。

我还删除了导致代码噪音的所有不必要的双引号。

答案 1 :(得分:0)

试试这个:

SELECT 
  COUNT(*) AS count_all, 
  response_variables.id, 
  response_variables.var_name AS response_variables_id_response_variables_var_name
FROM questions
WHERE 1=1
AND EXISTS (Select 1 from responses where responses.id = questions.id)
AND EXISTS (Select 1 from response_variables where response_variables.id = questions.id)
AND questions.key = 'rbmmpmvs'
GROUP BY response_variables.id, response_variables.var_name
ORDER BY response_variables.var_name

此外,您的查询正在执行基于磁盘的外部合并,这可能非常慢,并且大约90%的时间用于排序(259.596 ms)。

正如解释计划中所述,大约有288kb的数据写入磁盘,因为数据无法放入work_mem中。为事务增加本地work_mem会强制规划人员使用内存快速排序,这应该比外部合并排序快得多。