我有一个表现不佳的SQL查询。我已经对连接进行了一些研究,观察过教程,确保我已经定义了正确的索引等等,但老实说,我对如何提高这个被称为查询的性能有点迷茫。
我有以下架构定义:
create_table "training_plans", :force => true do |t|
t.integer "user_id"
end
add_index "training_plans", ["user_id"], :name => "index_training_plans_on_user_id"
create_table "training_weeks", :force => true do |t|
t.integer "training_plan_id"
t.date "start_date"
end
add_index "training_weeks", ["training_plan_id", "start_date"], :name => "index_training_weeks_on_training_plan_id_and_start_date"
add_index "training_weeks", ["training_plan_id"], :name => "index_training_weeks_on_training_plan_id"
create_table "training_efforts", :force => true do |t|
t.string "name"
t.date "plandate"
t.integer "training_week_id"
end
add_index "training_efforts", ["plandate"], :name => "index_training_efforts_on_plandate"
add_index "training_efforts", ["training_week_id", "plandate"], :name => "index_training_efforts_on_training_week_id_and_plandate"
add_index "training_efforts", ["training_week_id"], :name => "index_training_efforts_on_training_week_id"
然后接下来调用以收集与特定training_plan相关的所有training_efforts,包括所有相关的ride对象,其中training_effort plandate在目标日期范围内,由planate排序结果。
tefts = self.training_efforts.includes(:rides).order("plandate ASC").where("plandate >= ? AND plandate <= ?",
beginning_date,
end_date)
这会产生以下查询输出:
TrainingEffort Load (3393.6ms) SELECT "training_efforts".* FROM "training_efforts"
INNER JOIN "training_weeks" ON "training_efforts"."training_week_id" = "training_weeks"."id"
WHERE "training_weeks"."training_plan_id" = 104
AND (plandate >= '2015-01-05' AND plandate <= '2016-01-03') ORDER BY plandate ASC
我相信我已经定义了正确的索引。桌子不是那么大。然而,这需要花费大量时间。作为进一步的背景,这是在Heroku Postgres上。最后我在我的开发系统中提到,查询比大多数(3.3ms)慢,但仍然没有比平均值慢1000倍的任何地方......
提前感谢您提供优化此查询的任何帮助。
更新 这是查询的EXPLAIN输出(在我的开发系统上发布):
explain SELECT "training_efforts".* FROM "training_efforts" INNER JOIN "training_weeks"
ON "training_efforts"."training_week_id" = "training_weeks"."id"
WHERE "training_weeks"."training_plan_id" = 7
AND (plandate >= '2015-01-05' AND plandate <= '2016-01-03') ORDER BY plandate ASC;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Sort (cost=430.52..432.04 rows=606 width=120)
Sort Key: training_efforts.plandate
-> Hash Join (cost=15.12..402.51 rows=606 width=120)
Hash Cond: (training_efforts.training_week_id = training_weeks.id)
-> Seq Scan on training_efforts (cost=0.00..377.25 rows=1089 width=120)
Filter: ((plandate >= '2015-01-05'::date) AND (plandate <= '2016-01-03'::date))
-> Hash (cost=11.86..11.86 rows=261 width=4)
-> Seq Scan on training_weeks (cost=0.00..11.86 rows=261 width=4)
Filter: (training_plan_id = 7)
更新2 尝试使用不同的查询来查看我的索引是否会被使用,并注意到training_weeks与training_weeks(两者都有日期列)相比,有7倍的training_efforts,我会尝试搜索training_week日期而不是training_effort日期如下:
explain SELECT "training_efforts".* FROM "training_efforts" INNER JOIN "training_weeks"
ON "training_weeks"."id" = "training_efforts"."training_week_id"
WHERE "training_weeks"."id" IN (SELECT "training_weeks"."id" FROM "training_weeks"
WHERE "training_weeks"."training_plan_id" = 7 AND (start_date >= '2015-01-05' AND start_date <= '2016-01-03'))
ORDER BY plandate ASC;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=376.83..378.34 rows=602 width=120)
Sort Key: training_efforts.plandate
-> Nested Loop (cost=14.23..349.04 rows=602 width=120)
-> Hash Semi Join (cost=13.95..26.83 rows=86 width=8)
Hash Cond: (training_weeks.id = training_weeks_1.id)
-> Seq Scan on training_weeks (cost=0.00..10.69 rows=469 width=4)
-> Hash (cost=12.87..12.87 rows=86 width=4)
-> Bitmap Heap Scan on training_weeks training_weeks_1 (cost=5.37..12.87 rows=86 width=4)
Recheck Cond: ((training_plan_id = 7) AND (start_date >= '2015-01-05'::date) AND (start_date <= '2016-01-03'::date))
-> Bitmap Index Scan on index_training_weeks_on_training_plan_id_and_start_date (cost=0.00..5.35 rows=86 width=0)
Index Cond: ((training_plan_id = 7) AND (start_date >= '2015-01-05'::date) AND (start_date <= '2016-01-03'::date))
-> Index Scan using index_training_efforts_on_training_week_id on training_efforts (cost=0.28..3.68 rows=7 width=120)
Index Cond: (training_week_id = training_weeks.id)
这似乎稍微好一些,但我仍然不相信这是优化的......
答案 0 :(得分:0)
每张表中有多少行?你最近重新创建这些表还是旧的?你最近分析了这些表吗?看起来它正在进行seq_scans而不使用任何索引。
我发出了
vacuum analyze
在您的整个数据库上,或至少这两个表。很多时候,如果索引在表格中没有正确的统计信息,优化器将跳过索引。
答案 1 :(得分:0)
看起来你实际上并没有使用JOIN
的输出,所以我建议完全放弃它,看看是否能提高性能。
我建议使用原始查询(您应该能够使用 SQL调用 ActiveRecord 对象&#39; connection.execute
方法和参数,用?
替换需要由SQL库插入的参数(即可变),然后将这些参数作为列表作为第二个arg传递给方法。
对于原始的 SQL ,我建议尝试类似下面的内容(根据需要替换占位符和参数,以适应任何不同的参数)。我怀疑这会有更好的表现。
SELECT te.*
FROM training_efforts AS te
WHERE EXISTS (SELECT 1
FROM training_weeks AS tw
WHERE tw.training_week_id = te.training_week_id
AND tw.training_plan_id = 7
AND start_date >= '2015-01-05' AND start_date <= '2016-01-03'
)
ORDER BY plandate ASC
在将其转换为 ActiveRecord 查询方面,我不确定它是否提供了相当级别的控制 - 最好将其保留为原始查询< / em>的