Question

我有以下两个表：

StudentCourse
- Id, 
- StudentId, 
- CourseId

StudentId和CourseId

上的唯一索引

StudentCourseCount
- Id, 
- Student1Id, 
- Student2Id, 
- CourseCount

Student1Id和CourseCount

上的索引

Student2Id和CourseCount

上的索引

当我有CourseId时，我列出了参加该课程的学生。我想要完成的关键是在一名学生的帮助下，我想列出他们以前参加过课程的其他学生。

我正在尝试以下查询：

SELECT * FROM StudentCourseCount sc
INNER JOIN StudentCourse s1 ON s1.course_id = <id> AND sc.student1_id = s1.student_id
INNER JOIN StudentCourse s2 ON s2.course_id = <id> AND sc.student2_id = s2.student_id
WHERE sc.course_count > 1

查询按预期工作;但是，我的超大桌子（超过10,000,000行）的速度非常慢。

当我解释查询时，StudentCourseCount没有使用索引。它正确地标识了Student1Id和Student2Id的可能索引，但是没有使用它们。

执行计划：表：sc可能的键：Student1Id，Student2Id键：   null行：28648392

表：c2键：student_id行：1

表：c1键：student_id行：1

第一个表格显然是扫描而不是使用键来快速过滤。

Answer 1

好像你应该将course_id过滤器放在外部选择中。您在StudentCourseCount sc上唯一的过滤器是course_count。假设您只搜索1个course_id，则应该有sc.course_count＆gt; 1 AND sc.course_id = id。否则，您的联接尝试将过滤器应用于sc.course_count＆gt; 1结果集。

假设值均匀分布，则此查询（或变体）应该是高效的。 10M行不是非常大，它足够大以至于需要优化查询。

Answer 2

我认为Brent Baisley有一个好点，我在开始时没有看到<id>。我想你想让两个学生都在同一个课程中，这样你就可以在Join中链接他们并在where子句中获得course_id=<id>条件。我认为优化器应该自己做这些事情，但值得一试：

SELECT * FROM StudentCourseCount sc
INNER JOIN StudentCourse s1 ON sc.student1_id = s1.student_id
INNER JOIN StudentCourse s2 ON s2.course_id = s1.course_id AND sc.student2_id = s2.student_id
WHERE sc.course_count > 1 AND s1.course_id = <id>

Answer 3

这是一个非常大的查询，它返回一个非常大的结果集。我不确定你可以优化它，因为返回的数据量很大。

SELECT *
FROM StudentCourseCount sc INNER JOIN
     StudentCourse s1
     ON s1.course_id = <id> AND sc.student1_id = s1.student_id INNER JOIN
     StudentCourse s2
     ON s2.course_id = <id> AND sc.student2_id = s2.student_id
WHERE sc.course_count > 1;

表中所需的索引是StudentCourseCount(course_count, student_id)和StudentCourse(student_id, course_id)。

现在，你说这个查询有效，我认为你的意思是你喜欢结果。它回答了以下问题：

获取所有已完成课程id且已参加多门课程的学生

这与以下内容截然不同：

在一名学生的指导下，我想列出他们以前参加过课程的其他学生。

如果这是您真正的问题，我建议您在Stack Overflow上提出另一个问题以获得更好的查询。

未使用数据库索引导致查询速度慢

3 个答案: