我有两个表,分别称为ny_clean(3454602项)和Pickup_0_ids_temp_table(2739268项),它们都有一个id CHAR(11)列,该列是主键,并且在其上面具有BTREE索引(MySQL 5.7)。
pickup_0_ids_temp_table中的“ id”列是ny_clean的子集,我想得到的结果是ny_clean,而没有pickup_0_ids_temp_table中的id值。
选项1:
EXPLAIN SELECT * FROM pickup_0_ids_temp_table as t JOIN ny_clean as n ON n.id != t.id;
+----+-------------+----------+------------+-------+---------------+-------------------+---------+------+---------+----------+-----------------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+------------+-------+---------------+-------------------+---------+------+---------+----------+-----------------------------------------------------------------+ | 1 | SIMPLE | t | NULL | index | NULL | PRIMARY | 11 | NULL | 2734512 | 100.00 | Using index | | 1 | SIMPLE | ny_clean | NULL | index | NULL | btree_pk_ny_clean | 11 | NULL | 3445904 | 90.00 | Using where; Using index; Using join buffer (Block Nested Loop) | +----+-------------+----------+------------+-------+---------------+-------------------+---------+------+---------+----------+-----------------------------------------------------------------+
选项2:
EXPLAIN SELECT * FROM ny_clean as n WHERE n.id NOT IN ( SELECT id FROM pickup_0_ids_temp_table);
+----+--------------------+-------------------------+------------+-----------------+------------------------+---------+---------+------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-------------------------+------------+-----------------+------------------------+---------+---------+------+---------+----------+-------------+ | 1 | PRIMARY | n | NULL | ALL | NULL | NULL | NULL | NULL | 3445904 | 100.00 | Using where | | 2 | DEPENDENT SUBQUERY | pickup_0_ids_temp_table | NULL | unique_subquery | PRIMARY,btree_pickup_0 | PRIMARY | 11 | func | 1 | 100.00 | Using index | +----+--------------------+-------------------------+------------+-----------------+------------------------+---------+---------+------+---------+----------+-------------+
然后我在这个较大的查询中使用其中一个选项
EXPLAIN INSERT INTO y SELECT id, pickup_longitude, pickup_latitude FROM x JOIN (OPTION 1 OR 2) as z ON z.id = x.id;
当我在较大的查询中使用选项1时,它运行了两天,但没有完成。另一方面,选项2在不到30分钟的时间内完成了工作
我的问题:那是为什么? 按照MySQL文档(https://dev.mysql.com/doc/refman/5.7/en/subquery-materialization.html),我怀疑这是由于子查询的实现而引起的,但是我将如何检查呢?
我将EXPLAIN输出解释为错误吗?因为从它来看,我希望选项1更快,因为它在两个表上都使用了索引
还是必须执行较大的查询?
预先感谢
答案 0 :(得分:3)
您的选项1并没有您认为的那样。
如果您有两个表
long result = collection.countDocuments(eq("_id", idValue));
n.id t.id
1 1
2 2
3 3
您得到:
ON n.id != t.id;
那几乎是笛卡尔积。所以3.4毫米x 2.7毫米〜9.18毫米行
然后您尝试执行JOIN操作,因为该物化表没有索引将花费很长时间。