NOT IN子查询与ON!=操作

时间:2018-10-29 19:41:20

标签: mysql query-optimization

我有两个表,分别称为ny_clean(3454602项)和Pickup_0_ids_temp_table(2739268项),它们都有一个id CHAR(11)列,该列是主键,并且在其上面具有BTREE索引(MySQL 5.7)。

pickup_0_ids_temp_table中的“ id”列是ny_clean的子集,我想得到的结果是ny_clean,而没有pickup_0_ids_temp_table中的id值。

选项1:

EXPLAIN
SELECT *
FROM pickup_0_ids_temp_table as t
JOIN ny_clean as n
ON n.id != t.id;
+----+-------------+----------+------------+-------+---------------+-------------------+---------+------+---------+----------+-----------------------------------------------------------------+
| id | select_type | table    | partitions | type  | possible_keys | key               | key_len | ref  | rows    | filtered | Extra                                                           |
+----+-------------+----------+------------+-------+---------------+-------------------+---------+------+---------+----------+-----------------------------------------------------------------+
|  1 | SIMPLE      | t        | NULL       | index | NULL          | PRIMARY           | 11      | NULL | 2734512 |   100.00 | Using index                                                     |
|  1 | SIMPLE      | ny_clean | NULL       | index | NULL          | btree_pk_ny_clean | 11      | NULL | 3445904 |    90.00 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+----------+------------+-------+---------------+-------------------+---------+------+---------+----------+-----------------------------------------------------------------+

选项2:

EXPLAIN
SELECT *
FROM ny_clean as n
WHERE n.id NOT IN (
    SELECT id 
    FROM pickup_0_ids_temp_table);
+----+--------------------+-------------------------+------------+-----------------+------------------------+---------+---------+------+---------+----------+-------------+
| id | select_type        | table                   | partitions | type            | possible_keys          | key     | key_len | ref  | rows    | filtered | Extra       |
+----+--------------------+-------------------------+------------+-----------------+------------------------+---------+---------+------+---------+----------+-------------+
|  1 | PRIMARY            | n                       | NULL       | ALL             | NULL                   | NULL    | NULL    | NULL | 3445904 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | pickup_0_ids_temp_table | NULL       | unique_subquery | PRIMARY,btree_pickup_0 | PRIMARY | 11      | func |       1 |   100.00 | Using index |
+----+--------------------+-------------------------+------------+-----------------+------------------------+---------+---------+------+---------+----------+-------------+

然后我在这个较大的查询中使用其中一个选项

EXPLAIN
INSERT INTO y    
SELECT id, pickup_longitude, pickup_latitude 
FROM x
JOIN 
(OPTION 1 OR 2) as z
ON z.id =  x.id;

当我在较大的查询中使用选项1时,它运行了两天,但没有完成。另一方面,选项2在不到30分钟的时间内完成了工作

我的问题:那是为什么? 按照MySQL文档(https://dev.mysql.com/doc/refman/5.7/en/subquery-materialization.html),我怀疑这是由于子查询的实现而引起的,但是我将如何检查呢?

我将EXPLAIN输出解释为错误吗?因为从它来看,我希望选项1更快,因为它在两个表上都使用了索引

还是必须执行较大的查询?

预先感谢

1 个答案:

答案 0 :(得分:3)

您的选项1并没有您认为的那样。

如果您有两个表

long result = collection.countDocuments(eq("_id", idValue));

n.id t.id 1 1 2 2 3 3

您得到:

ON n.id != t.id;

那几乎是笛卡尔积。所以3.4毫米x 2.7毫米〜9.18毫米行

然后您尝试执行JOIN操作,因为该物化表没有索引将花费很长时间。