优化SQL JOIN调用

时间:2015-01-05 22:02:38

标签: sql ruby-on-rails postgresql heroku-postgres

我有一个表现不佳的SQL查询。我已经对连接进行了一些研究,观察过教程,确保我已经定义了正确的索引等等,但老实说,我对如何提高这个被称为查询的性能有点迷茫。

我有以下架构定义:

create_table "training_plans", :force => true do |t|
  t.integer  "user_id"
end

add_index "training_plans", ["user_id"], :name => "index_training_plans_on_user_id"

create_table "training_weeks", :force => true do |t|
  t.integer  "training_plan_id"
  t.date     "start_date"
end

add_index "training_weeks", ["training_plan_id", "start_date"], :name => "index_training_weeks_on_training_plan_id_and_start_date"
add_index "training_weeks", ["training_plan_id"], :name => "index_training_weeks_on_training_plan_id"

create_table "training_efforts", :force => true do |t|
  t.string   "name"
  t.date     "plandate"
  t.integer  "training_week_id"
end

add_index "training_efforts", ["plandate"], :name => "index_training_efforts_on_plandate"
add_index "training_efforts", ["training_week_id", "plandate"], :name => "index_training_efforts_on_training_week_id_and_plandate"
add_index "training_efforts", ["training_week_id"], :name => "index_training_efforts_on_training_week_id"

然后接下来调用以收集与特定training_plan相关的所有training_efforts,包括所有相关的ride对象,其中training_effort plandate在目标日期范围内,由planate排序结果。

    tefts = self.training_efforts.includes(:rides).order("plandate ASC").where("plandate >= ? AND plandate <= ?",
                                                      beginning_date,
                                                      end_date)

这会产生以下查询输出:

TrainingEffort Load (3393.6ms)  SELECT "training_efforts".* FROM "training_efforts" 
  INNER JOIN "training_weeks" ON "training_efforts"."training_week_id" = "training_weeks"."id" 
  WHERE "training_weeks"."training_plan_id" = 104 
  AND (plandate >= '2015-01-05' AND plandate <= '2016-01-03') ORDER BY plandate ASC

我相信我已经定义了正确的索引。桌子不是那么大。然而,这需要花费大量时间。作为进一步的背景,这是在Heroku Postgres上。最后我在我的开发系统中提到,查询比大多数(3.3ms)慢,但仍然没有比平均值慢1000倍的任何地方......

提前感谢您提供优化此查询的任何帮助。

更新 这是查询的EXPLAIN输出(在我的开发系统上发布):

explain SELECT "training_efforts".* FROM "training_efforts" INNER JOIN "training_weeks" 
  ON "training_efforts"."training_week_id" = "training_weeks"."id" 
  WHERE "training_weeks"."training_plan_id" = 7 
  AND (plandate >= '2015-01-05' AND plandate <= '2016-01-03') ORDER BY plandate ASC;
                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Sort  (cost=430.52..432.04 rows=606 width=120)
   Sort Key: training_efforts.plandate
   ->  Hash Join  (cost=15.12..402.51 rows=606 width=120)
         Hash Cond: (training_efforts.training_week_id = training_weeks.id)
         ->  Seq Scan on training_efforts  (cost=0.00..377.25 rows=1089 width=120)
               Filter: ((plandate >= '2015-01-05'::date) AND (plandate <= '2016-01-03'::date))
         ->  Hash  (cost=11.86..11.86 rows=261 width=4)
               ->  Seq Scan on training_weeks  (cost=0.00..11.86 rows=261 width=4)
                     Filter: (training_plan_id = 7) 

更新2 尝试使用不同的查询来查看我的索引是否会被使用,并注意到training_weeks与training_weeks(两者都有日期列)相比,有7倍的training_efforts,我会尝试搜索training_week日期而不是training_effort日期如下:

explain SELECT "training_efforts".* FROM "training_efforts" INNER JOIN "training_weeks" 
  ON "training_weeks"."id" = "training_efforts"."training_week_id" 
  WHERE "training_weeks"."id" IN (SELECT "training_weeks"."id" FROM "training_weeks" 
  WHERE "training_weeks"."training_plan_id" = 7 AND (start_date >= '2015-01-05' AND start_date <= '2016-01-03')) 
  ORDER BY plandate ASC;
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=376.83..378.34 rows=602 width=120)
   Sort Key: training_efforts.plandate
   ->  Nested Loop  (cost=14.23..349.04 rows=602 width=120)
         ->  Hash Semi Join  (cost=13.95..26.83 rows=86 width=8)
               Hash Cond: (training_weeks.id = training_weeks_1.id)
               ->  Seq Scan on training_weeks  (cost=0.00..10.69 rows=469 width=4)
               ->  Hash  (cost=12.87..12.87 rows=86 width=4)
                     ->  Bitmap Heap Scan on training_weeks training_weeks_1  (cost=5.37..12.87 rows=86 width=4)
                           Recheck Cond: ((training_plan_id = 7) AND (start_date >= '2015-01-05'::date) AND (start_date <= '2016-01-03'::date))
                           ->  Bitmap Index Scan on index_training_weeks_on_training_plan_id_and_start_date  (cost=0.00..5.35 rows=86 width=0)
                                 Index Cond: ((training_plan_id = 7) AND (start_date >= '2015-01-05'::date) AND (start_date <= '2016-01-03'::date))
         ->  Index Scan using index_training_efforts_on_training_week_id on training_efforts  (cost=0.28..3.68 rows=7 width=120)
               Index Cond: (training_week_id = training_weeks.id)

这似乎稍微好一些,但我仍然不相信这是优化的......

2 个答案:

答案 0 :(得分:0)

每张表中有多少行?你最近重新创建这些表还是旧的?你最近分析了这些表吗?看起来它正在进行seq_scans而不使用任何索引。

我发出了

vacuum analyze

在您的整个数据库上,或至少这两个表。很多时候,如果索引在表格中没有正确的统计信息,优化器将跳过索引。

答案 1 :(得分:0)

看起来你实际上并没有使用JOIN的输出,所以我建议完全放弃它,看看是否能提高性能。

我建议使用原始查询(您应该能够使用 SQL调用 ActiveRecord 对象&#39; connection.execute方法和参数,用?替换需要由SQL库插入的参数(即可变),然后将这些参数作为列表作为第二个arg传递给方法。

对于原始的 SQL ,我建议尝试类似下面的内容(根据需要替换占位符和参数,以适应任何不同的参数)。我怀疑这会有更好的表现。

SELECT te.*
FROM training_efforts AS te
WHERE EXISTS (SELECT 1
              FROM training_weeks AS tw
              WHERE tw.training_week_id = te.training_week_id
                AND tw.training_plan_id = 7
                AND start_date >= '2015-01-05' AND start_date <= '2016-01-03'
            )
ORDER BY plandate ASC

在将其转换为 ActiveRecord 查询方面,我不确定它是否提供了相当级别的控制 - 最好将其保留为原始查询< / em>的