最近,我处理了从MySQL数据库中检索大量数据的数据,这些数据包含数千条记录。由于这是我第一次处理这么大的数据集,所以我没有考虑SQL语句的效率。问题来了。
以下是数据库的表格 (它只是一个简单的课程系统数据库模型):
当然:
+-----------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+----------------+
| course_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(20) | NO | | NULL | |
| lecturer | varchar(20) | NO | | NULL | |
| credit | float | NO | | NULL | |
| week_from | tinyint(3) unsigned | NO | | NULL | |
| week_to | tinyint(3) unsigned | NO | | NULL | |
+-----------+---------------------+------+-----+---------+----------------+
选择:
+-----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+----------------+
| select_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| card_no | int(10) unsigned | NO | | NULL | |
| course_id | int(10) unsigned | NO | | NULL | |
| term | varchar(7) | NO | | NULL | |
+-----------+------------------+------+-----+---------+----------------+
当我想要检索学生选择的所有课程(带有他的卡号)时, SQL语句是
SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `course` WHERE course_id IN (
SELECT course_id FROM `select` WHERE card_no=<student's card number>
);
但是,它非常缓慢,很长一段时间没有返回任何东西。
所以我将WHERE IN
条款更改为NATURAL JOIN
。这是SQL,
SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `select` NATURAL JOIN `course`
WHERE card_no=<student's card number>;
立即返回,工作正常!
所以我的问题是:
NATURAL JOIN
和WHERE IN
条款之间有什么区别?INDEX
?)NATURAL JOIN
或WHERE IN
?答案 0 :(得分:5)
理论上,这两个查询是等价的。我认为这只是MySQL查询优化器的糟糕实现,导致JOIN比WHERE IN更有效。所以我总是使用JOIN。
您是否查看了两个查询的EXPLAIN输出?这是我为WHERE IN
得到的:
+----+--------------------+-------------------+----------------+-------------------+---------+---------+------------+---------+--------------------------+
| 1 | PRIMARY | t_users | ALL | NULL | NULL | NULL | NULL | 2458304 | Using where |
| 2 | DEPENDENT SUBQUERY | t_user_attributes | index_subquery | PRIMARY,attribute | PRIMARY | 13 | func,const | 7 | Using index; Using where |
+----+--------------------+-------------------+----------------+-------------------+---------+---------+------------+---------+--------------------------+
它显然正在执行子查询,然后遍历主表中的每一行,测试它是否在 - 它不使用索引。对于JOIN我得到:
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
| 1 | SIMPLE | t_user_attributes | ref | PRIMARY,attribute | attribute | 1 | const | 15 | Using where |
| 1 | SIMPLE | t_users | eq_ref | username,username_2 | username | 12 | bbodb_test.t_user_attributes.username | 1 | |
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
现在它使用索引。
答案 1 :(得分:3)
试试这个:
SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `course` c
WHERE c.course_id IN (
SELECT s.course_id
FROM `select` s
WHERE card_no=<student's card number>
AND c.course_id = s.course_id
);
注意在子查询中添加了AND子句。这称为共同相关的子查询,因为它与两个course_id相关,就像NATURAL JOIN一样。
我认为Barmar的指数解释是正确的。