查询耗时很长(包含说明)

时间:2012-09-05 23:15:51

标签: mysql performance

查询目标:

按地区展示比赛。

查询:

SELECT school_data_schools_outer.district_id, 
       school_data_race_ethnicity_raw_outer.year,  
       school_data_race_ethnicity_raw_outer.race,
       ROUND( 
           SUM( school_data_race_ethnicity_raw_outer.count) /
                (SELECT SUM(count)
                   FROM school_data_race_ethnicity_raw as school_data_race_ethnicity_raw_inner
             INNER JOIN school_data_schools as school_data_schools_inner 
                  USING (school_id)
                  WHERE school_data_schools_outer.district_id = school_data_schools_inner.district_id 
                    AND school_data_race_ethnicity_raw_outer.year = school_data_race_ethnicity_raw_inner.year) * 100, 2)
      FROM school_data_race_ethnicity_raw as school_data_race_ethnicity_raw_outer
INNER JOIN school_data_schools as school_data_schools_outer USING (school_id)
  GROUP BY school_data_schools_outer.district_id, 
           school_data_race_ethnicity_raw_outer.year, 
           school_data_race_ethnicity_raw_outer.race

mysql> explain SELECT school_data_schools_outer.district_id, school_data_race_ethnicity_raw_outer.year, school_data_race_ethnicity_raw_outer.race,ROUND(SUM(school_data_race_ethnicity_raw_outer.count)/( SELECT SUM(count) FROM school_data_race_ethnicity_raw as school_data_race_ethnicity_raw_inner INNER JOIN school_data_schools as school_data_schools_inner USING (school_id) WHERE school_data_schools_outer.district_id = school_data_schools_inner.district_id and school_data_race_ethnicity_raw_outer.year = school_data_race_ethnicity_raw_inner.year ) * 100,2) FROM school_data_race_ethnicity_raw as school_data_race_ethnicity_raw_outer INNER JOIN school_data_schools as school_data_schools_outer USING (school_id) GROUP BY school_data_schools_outer.district_id, school_data_race_ethnicity_raw_outer.year, school_data_race_ethnicity_raw_outer.race;
+----+--------------------+--------------------------------------+--------+----------------------------+---------+---------+----------------------------------------------------------------------+-------+---------------------------------+
| id | select_type        | table                                | type   | possible_keys              | key     | key_len | ref                                                                  | rows  | Extra                           |
+----+--------------------+--------------------------------------+--------+----------------------------+---------+---------+----------------------------------------------------------------------+-------+---------------------------------+
|  1 | PRIMARY            | school_data_race_ethnicity_raw_outer | ALL    | school_id,school_id_2      | NULL    | NULL    | NULL                                                                 | 84012 | Using temporary; Using filesort |
|  1 | PRIMARY            | school_data_schools_outer            | eq_ref | PRIMARY                    | PRIMARY | 257     | rocdocs_main_drupal_7.school_data_race_ethnicity_raw_outer.school_id |     1 |                                 |
|  2 | DEPENDENT SUBQUERY | school_data_race_ethnicity_raw_inner | ref    | school_id,year,school_id_2 | year    | 4       | func                                                                 |  8402 |                                 |
|  2 | DEPENDENT SUBQUERY | school_data_schools_inner            | eq_ref | PRIMARY                    | PRIMARY | 257     | rocdocs_main_drupal_7.school_data_race_ethnicity_raw_inner.school_id |     1 | Using where                     |
+----+--------------------+--------------------------------------+--------+----------------------------+---------+---------+----------------------------------------------------------------------+-------+---------------------------------+
4 rows in set (0.00 sec)

mysql>

mysql> describe school_data_race_ethnicity_raw;
+-----------+--------------+------+-----+---------+----------------+
| Field     | Type         | Null | Key | Default | Extra          |
+-----------+--------------+------+-----+---------+----------------+
| id        | int(11)      | NO   | PRI | NULL    | auto_increment |
| school_id | varchar(255) | NO   | MUL | NULL    |                |
| year      | int(11)      | NO   | MUL | NULL    |                |
| race      | varchar(255) | NO   |     | NULL    |                |
| count     | int(11)      | NO   |     | NULL    |                |
+-----------+--------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)

mysql> describe school_data_schools;
+-------------+----------------+------+-----+---------+-------+
| Field       | Type           | Null | Key | Default | Extra |
+-------------+----------------+------+-----+---------+-------+
| school_id   | varchar(255)   | NO   | PRI | NULL    |       |
| grade_level | varchar(255)   | NO   |     | NULL    |       |
| district_id | varchar(255)   | NO   |     | NULL    |       |
| school_name | varchar(255)   | NO   |     | NULL    |       |
| address     | varchar(255)   | NO   |     | NULL    |       |
| city        | varchar(255)   | NO   |     | NULL    |       |
| lat         | decimal(20,10) | NO   |     | NULL    |       |
| lon         | decimal(20,10) | NO   |     | NULL    |       |
+-------------+----------------+------+-----+---------+-------+
8 rows in set (0.00 sec)

注意:我也尝试过:

select sds.school_id, 
  detail.year, 
  detail.race,
  ROUND((detail.count / summary.total) * 100 ,2) as percent 
FROM school_data_race_ethnicity_raw as detail
inner join school_data_schools as sds USING (school_id)
inner join (
  select sds2.district_id, year, sum(count) as total
  from school_data_race_ethnicity_raw
  inner join school_data_schools as sds2 USING (school_id)
  group by sds2.district_id, year
  ) as summary on summary.district_id = sds.district_id 
    and summary.year = detail.year

2 个答案:

答案 0 :(得分:0)

这很慢,因为:

  1. 你没有在school_data_race_ethnicity_raw_outer上使用索引,所以它正在扫描每个~84,000行
  2. 您正在使用相关子查询,这意味着您的复杂计算必须每行运行一次,即84,000次。
  3. 最好的方法是不使用相关子查询,但如果没有,那么为了使其快速运行,您需要使用covering indexes以便整个内部查询(以及其他部分通过他们自己的索引)可以使用索引快速运行。有关索引主题的精彩教程,请检查this。它教会了我很多!现在,您的内部查询只使用school_data_race_ethnicity_raw上的年份索引,因此必须通过为84000个计算中的每一个读取8000行来查找所需的其余内容。索引会使这个更快,例如在school_data_race_ethnicity_raw上创建一个复合索引,你会发现它有帮助:

    CREATE index inner_composite ON school_data_race_ethnicity_raw (year, district_id, schoolid, count)
    

    这将允许从索引中获取WHERE中使用的所有字段,然后是连接字段,然后是选择所需的字段。您应该会在解释结果的“关键”列中看到它。此外,如果你做对了,你会在最右边的列中看到'使用索引',表明没有发生表访问,这快了几个数量级。

    您可以通过为查询提及的列添加大量索引来实验快速和脏的样式,并查看在键列中拾取的内容。如果出现了某些内容,请阅读您的查询以查看该表中正在使用的其他列,然后添加一个新索引,并在右侧添加这些列,并查看是否更好。一旦找到有用的索引,请记得删除未使用的索引。

    MySQL不允许你直接索引列的SUM,这是最快的方式,所以除非你想转移到另一个数据库(好主意,如果你可以),这总是有点慢。

答案 1 :(得分:-1)

这应该是您需要汇总数据以获得按地区划分的比赛所需的全部数据,不确定为什么您在原始数据中进行了如此多的数学运算,因为没有必要实现您的目标,并且正在强迫一些疯狂的子查询。

SELECT SUM(students.count) as studentCount, School.district_id, students.race
FROM school_data_schools schools, 
school_data_race_ethnicity_raw students
WHERE shools.school_id = students.school_id
GROUP BY district_id, race

您可能还需要school_data_race_ethnicity_raw.school_id上的索引(单独,不作为多列键的一部分)

编辑并不知道OP正在寻找百分比细分,而不仅仅是总数

SELECT ((studentCount / districtTotal) * 100) as percentage, district_id, race

FROM(

SELECT SUM(students.count) as studentCount, Schools.district_id, students.race,
  (SELECT SUM(inStudents.count)
   FROM school_data_schools inSchools, 
    school_data_race_ethnicity_raw inStudents
   WHERE inSchools.school_id = inStudents.school_id
   AND inSchools.district_ID = Schools.district_id
   GROUP BY inSchools.district_id) as districtTotal

    FROM school_data_schools schools, 
    school_data_race_ethnicity_raw students

WHERE schools.school_id = students.school_id
GROUP BY district_id, race
) table1

这将很快运行,仍然需要确保school_data_race_ethnicity_raw.school_id上有一个不属于多列索引的索引。你可以在行动here中看到它,虽然我的测试用例相当小,但它确实可以检查出来。