我正在做一个包含大量包含的非常复杂的查找,其中的rails分为一系列离散查询而不是单个大连接。查询非常慢 - 我的数据集不是很大,没有一个表有超过几千条记录。
我索引了查询中检查的所有字段,但我担心索引由于某种原因没有帮助:我安装了一个名为“query_reviewer”的插件,它查看用于构建页面的查询,并列出他们的问题。这表明索引没有被使用,它具有在查询上调用'explain'的结果,其中列出了各种问题。以下是查找调用的示例:
Question.paginate(:all, {:page=>1, :include=>[:answers, :quizzes, :subject, {:taggings=>:tag}, {:gradings=>[:age_group, :difficulty]}], :conditions=>["((questions.subject_id = ?) or (questions.subject_id = ? and tags.name = ?))", "1", 19, "English"], :order=>"subjects.name, (gradings.difficulty_id is null), gradings.age_group_id, gradings.difficulty_id", :per_page=>30})
以下是生成的SQL查询:
SELECT DISTINCT `questions`.id
FROM `questions`
LEFT OUTER JOIN `taggings` ON `taggings`.taggable_id = `questions`.id
AND `taggings`.taggable_type = 'Question'
LEFT OUTER JOIN `tags` ON `tags`.id = `taggings`.tag_id
LEFT OUTER JOIN `subjects` ON `subjects`.id = `questions`.subject_id
LEFT OUTER JOIN `gradings` ON gradings.question_id = questions.id
WHERE (((questions.subject_id = '1') or (questions.subject_id = 19 and tags.name = 'English')))
ORDER BY subjects.name, (gradings.difficulty_id is null), gradings.age_group_id, gradings.difficulty_id
LIMIT 0, 30
SELECT `questions`.`id` AS t0_r0 <..etc...>
FROM `questions`
LEFT OUTER JOIN `answers` ON answers.question_id = questions.id
LEFT OUTER JOIN `quiz_questions` ON (`questions`.`id` = `quiz_questions`.`question_id`)
LEFT OUTER JOIN `quizzes` ON (`quizzes`.`id` = `quiz_questions`.`quiz_id`)
LEFT OUTER JOIN `subjects` ON `subjects`.id = `questions`.subject_id
LEFT OUTER JOIN `taggings` ON `taggings`.taggable_id = `questions`.id
AND `taggings`.taggable_type = 'Question'
LEFT OUTER JOIN `tags` ON `tags`.id = `taggings`.tag_id
LEFT OUTER JOIN `gradings` ON gradings.question_id = questions.id
LEFT OUTER JOIN `age_groups` ON `age_groups`.id = `gradings`.age_group_id
LEFT OUTER JOIN `difficulties` ON `difficulties`.id = `gradings`.difficulty_id
WHERE (((questions.subject_id = '1') or (questions.subject_id = 19 and tags.name = 'English')))
AND `questions`.id IN (602, 634, 666, 698, 730, 762, 613, 645, 677, 709, 741, 592, 624, 656, 688, 720, 752, 603, 635, 667, 699, 731, 763, 614, 646, 678, 710, 742, 593, 625)
ORDER BY subjects.name, (gradings.difficulty_id is null), gradings.age_group_id, gradings.difficulty_id
SELECT count(DISTINCT `questions`.id) AS count_all FROM `questions`
LEFT OUTER JOIN `answers` ON answers.question_id = questions.id
LEFT OUTER JOIN `quiz_questions` ON (`questions`.`id` = `quiz_questions`.`question_id`)
LEFT OUTER JOIN `quizzes` ON (`quizzes`.`id` = `quiz_questions`.`quiz_id`)
LEFT OUTER JOIN `subjects` ON `subjects`.id = `questions`.subject_id
LEFT OUTER JOIN `taggings` ON `taggings`.taggable_id = `questions`.id
AND `taggings`.taggable_type = 'Question'
LEFT OUTER JOIN `tags` ON `tags`.id = `taggings`.tag_id
LEFT OUTER JOIN `gradings` ON gradings.question_id = questions.id
LEFT OUTER JOIN `age_groups` ON `age_groups`.id = `gradings`.age_group_id
LEFT OUTER JOIN `difficulties` ON `difficulties`.id = `gradings`.difficulty_id
WHERE (((questions.subject_id = '1') or (questions.subject_id = 19 and tags.name = 'English')))
实际上,看看这些格式都很好,这里有很多加入。这肯定不是最佳的。无论如何,看起来我有两个问题。
1)我在这里提到的每个id和外键字段都有一个索引。上面的第二个查询是最慢的,并且调用它上面的解释(直接在mysql中执行)给出了以下内容:
+----+-------------+----------------+--------+---------------------------------------------------------------------------------+-------------------------------------------------+---------+------------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+--------+---------------------------------------------------------------------------------+-------------------------------------------------+---------+------------------------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | questions | range | PRIMARY,index_questions_on_subject_id | PRIMARY | 4 | NULL | 30 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | answers | ref | index_answers_on_question_id | index_answers_on_question_id | 5 | millionaire_development.questions.id | 2 | |
| 1 | SIMPLE | quiz_questions | ref | index_quiz_questions_on_question_id | index_quiz_questions_on_question_id | 5 | millionaire_development.questions.id | 1 | |
| 1 | SIMPLE | quizzes | eq_ref | PRIMARY | PRIMARY | 4 | millionaire_development.quiz_questions.quiz_id | 1 | |
| 1 | SIMPLE | subjects | eq_ref | PRIMARY | PRIMARY | 4 | millionaire_development.questions.subject_id | 1 | |
| 1 | SIMPLE | taggings | ref | index_taggings_on_taggable_id_and_taggable_type,index_taggings_on_taggable_type | index_taggings_on_taggable_id_and_taggable_type | 263 | millionaire_development.questions.id,const | 1 | |
| 1 | SIMPLE | tags | eq_ref | PRIMARY | PRIMARY | 4 | millionaire_development.taggings.tag_id | 1 | Using where |
| 1 | SIMPLE | gradings | ref | index_gradings_on_question_id | index_gradings_on_question_id | 5 | millionaire_development.questions.id | 2 | |
| 1 | SIMPLE | age_groups | eq_ref | PRIMARY | PRIMARY | 4 | millionaire_development.gradings.age_group_id | 1 | |
| 1 | SIMPLE | difficulties | eq_ref | PRIMARY | PRIMARY | 4 | millionaire_development.gradings.difficulty_id | 1 | |
+----+-------------+----------------+--------+---------------------------------------------------------------------------------+-------------------------------------------------+---------+------------------------------------------------+------+----------------------------------------------+
query_reviewer插件可以这么说 - 它列出了几个问题:
Table questions: Using temporary table, Long key length (263), Using filesort
MySQL must do an extra pass to find out how to retrieve the rows in sorted order.
To resolve the query, MySQL needs to create a temporary table to hold the result.
The key used for the index was rather long, potentially affecting indices in memory
2)看起来rails并没有以一种非常优化的方式分解这个发现。是吗,你觉得呢?我最好手动执行几个查找查询,而不是一个大的组合查找?
感谢任何建议,max
答案 0 :(得分:2)
通常,ActiveRecord会预先加载与单独查询的关联,因为它通常更快。
但是,当它注意到您在:conditions
或:order
中使用了包含的关联时,它会执行一个大查询,包括所有表格,而不仅仅是所需的表格。
你可能做的是,只包括那些在条件中使用的表,然后预加载所有其他关联:
questions = Question.paginate(:all, {:page=>1, :include => [:subject, {:taggings=>:tag}, :gradings], :conditions=>["((questions.subject_id = ?) or (questions.subject_id = ? and tags.name = ?))", "1", 19, "English"], :order=>"subjects.name, (gradings.difficulty_id is null), gradings.age_group_id, gradings.difficulty_id", :per_page=>30})
Question.send(:preload_associations, questions, [:answers, :quizzes, {:gradings=>[:age_group, :difficulty]}])
第一个查询将在主题,标签,标签,评分和问题表上运行,因为它们用于编码/顺序。 并且:答案,:测验,age_group和:难度将是4个单独的简单查询。
然后您可以尝试优化更多索引等。
答案 1 :(得分:0)
来自query_reviewer的输出说,由于某些order_by
问题,mysql必须调用两次。只需从通话中删除order
部分,即可测试是否存在问题。如果是问题,那么它应该运行得更快。
您的查询看起来非常复杂。您确定需要加载这些复杂的数据吗?如果您不访问关联模型中的大多数字段,那么最好不要包含它们。