我有一个本土的(不是我自己的)版本控制系统,它具有以下数据结构:
create_table "activities", :force => true do |t|
t.string "source"
t.datetime "created_at", :null => false
t.datetime "updated_at", :null => false
t.integer "head_revision_id"
end
add_index "activities", ["head_revision_id"], :name => "index_activities_on_head_revision_id"
add_index "activities", ["source"], :name => "index_activities_on_source"
create_table "activity_revisions", :force => true do |t|
t.integer "activity_id"
t.string "activity_type"
t.string "title"
t.text "content"
t.text "comment"
t.integer "modified_by_id"
t.datetime "created_at", :null => false
t.datetime "updated_at", :null => false
end
add_index "activity_revisions", ["activity_id"], :name => "index_activity_revisions_on_activity_id"
add_index "activity_revisions", ["activity_type"], :name => "index_activity_revisions_on_activity_type"
add_index "activity_revisions", ["title"], :name => "index_activity_revisions_on_title"
应用程序显示从最新到最旧,分页(will_paginate)20到页面的活动列表。这是用于生成列表的查询:
Activity.where(conditions)
.joins(:head_revision)
.includes(:head_revision)
.order('activities.id DESC')
conditions
根据搜索表单中传递的值而有所不同。对于初始列表显示,conditions
为空。
从表面上看,这个查询很简单,但在执行时,大数据集的速度非常慢。我们目前有大约102,000个活动记录和512,000个activity_revision记录。在我们的生产服务器上,查询需要将近2秒钟来提供计数。在开发环境中,它很糟糕。
我觉得数据模型本身存在一些错误,我希望有人能给我一个更好的方法。
编辑:无条件地解释基本查询:
mysql> explain SELECT * FROM `activities` INNER JOIN `activity_revisions` ON `activity_revisions`.`id` = `activities`.`head_revision_id`;
+----+-------------+--------------------+--------+--------------------------------------+---------+---------+--------------------------------------------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+--------+--------------------------------------+---------+---------+--------------------------------------------+--------+-------+
| 1 | SIMPLE | activities | ALL | index_activities_on_head_revision_id | NULL | NULL | NULL | 106590 | |
| 1 | SIMPLE | activity_revisions | eq_ref | PRIMARY | PRIMARY | 4 | cms_production.activities.head_revision_id | 1 | |
+----+-------------+--------------------+--------+--------------------------------------+---------+---------+--------------------------------------------+--------+-------+
2 rows in set (0.00 sec)
和count(*)查询:
mysql> explain SELECT count(*) FROM `activities` INNER JOIN `activity_revisions` ON `activity_revisions`.`id` = `activities`.`head_revision_id`;
+----+-------------+--------------------+--------+--------------------------------------+--------------------------------------+---------+--------------------------------------------+--------+------------- +
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+--------+--------------------------------------+--------------------------------------+---------+--------------------------------------------+--------+------------- +
| 1 | SIMPLE | activities | index | index_activities_on_head_revision_id | index_activities_on_head_revision_id | 5 | NULL | 106590 | Using index |
| 1 | SIMPLE | activity_revisions | eq_ref | PRIMARY | PRIMARY | 4 | cms_production.activities.head_revision_id | 1 | Using index |
+----+-------------+--------------------+--------+--------------------------------------+--------------------------------------+---------+--------------------------------------------+--------+------------- +
2 rows in set (0.00 sec)
答案 0 :(得分:0)
我看到你已经索引了几个已经很好的列。我想说一个确保查询尽可能高效的最佳方法之一是确保在数据库中处理查询/检索的所有conditions
,将其相应的列编入索引。
答案 1 :(得分:0)
猜测为什么查询缓慢并不好玩,幸运的是我们不应该这样做。
看看http://guides.rubyonrails.org/active_record_querying.html#running-explain,让我们看看您的Activity
查询实际上在做什么。
听起来你正在查询一个mysql数据库,所以看一下那些解释结果的key
。正如MilesStanfield建议的那样,听起来你会发现你没有有效地使用索引。