Question

我在接受采访时被问到这个问题：从下面的2个表中，编写一个查询以吸引没有销售订单的客户。有多少种方法可以编写此查询，哪种方式可以获得最佳性能。

表1：Customer - CustomerID
表2：SalesOrder - OrderID, CustomerID, OrderDate

查询：

SELECT *
FROM Customer C
  RIGHT OUTER JOIN SalesOrder SO ON C.CustomerID = SO.CustomerID
WHERE SO.OrderID = NULL

我的查询是否正确，是否有其他方法可以编写查询并获得相同的结果？

Answer 1

我可以通过其他两种方式编写此查询：

SELECT C.*
FROM Customer C
LEFT OUTER JOIN SalesOrder SO ON C.CustomerID = SO.CustomerID
WHERE SO.CustomerID IS NULL

SELECT C.*
FROM Customer C
WHERE NOT C.CustomerID IN(SELECT CustomerID FROM SalesOrder)

Answer 2

回答MySQL而不是SQL Server，因为你以后用SQL Server标记它，所以我想（因为这是一个面试问题，它不会打扰你，对于哪个DBMS）。但请注意，我编写的查询是标准的sql，它们应该在每个RDBMS中运行。但是，每个RDBMS如何处理这些查询是另一个问题。

我为你编写了这个小程序，有一个测试用例。它创建表格客户和您指定的订单，我添加了主键和外键，就像通常那样。没有其他索引，因为这里值得索引的每一列都已经是主键。创建了250个客户，其中100个订单（虽然不方便，但没有两次/多次）。随后转储数据，发布脚本以防万一你想通过增加数字来玩一点。

delimiter $$
create procedure fill_table()
begin
create table customers(customerId int primary key) engine=innodb;
set @x = 1;
while (@x <= 250) do
insert into customers values(@x);
set @x := @x + 1;
end while;

create table orders(orderId int auto_increment primary key,
customerId int,
orderDate timestamp,
foreign key fk_customer (customerId) references customers(customerId)
) engine=innodb;

insert into orders(customerId, orderDate)
select
customerId,
now() - interval customerId day
from
customers
order by rand()
limit 100;

end $$
delimiter ;

call fill_table();

对我来说，这导致了这个：

CREATE TABLE `customers` (
  `customerId` int(11) NOT NULL,
  PRIMARY KEY (`customerId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `customers` VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250);

CREATE TABLE `orders` (
  `orderId` int(11) NOT NULL AUTO_INCREMENT,
  `customerId` int(11) DEFAULT NULL,
  `orderDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`orderId`),
  KEY `fk_customer` (`customerId`),
  CONSTRAINT `orders_ibfk_1` FOREIGN KEY (`customerId`) REFERENCES `customers` (`customerId`)
) ENGINE=InnoDB AUTO_INCREMENT=128 DEFAULT CHARSET=utf8;

INSERT INTO `orders` VALUES (1,247,'2013-06-24 19:50:07'),(2,217,'2013-07-24 19:50:07'),(3,8,'2014-02-18 20:50:07'),(4,40,'2014-01-17 20:50:07'),(5,52,'2014-01-05 20:50:07'),(6,80,'2013-12-08 20:50:07'),(7,169,'2013-09-10 19:50:07'),(8,135,'2013-10-14 19:50:07'),(9,115,'2013-11-03 20:50:07'),(10,225,'2013-07-16 19:50:07'),(11,112,'2013-11-06 20:50:07'),(12,243,'2013-06-28 19:50:07'),(13,158,'2013-09-21 19:50:07'),(14,24,'2014-02-02 20:50:07'),(15,214,'2013-07-27 19:50:07'),(16,25,'2014-02-01 20:50:07'),(17,245,'2013-06-26 19:50:07'),(18,182,'2013-08-28 19:50:07'),(19,166,'2013-09-13 19:50:07'),(20,69,'2013-12-19 20:50:07'),(21,85,'2013-12-03 20:50:07'),(22,44,'2014-01-13 20:50:07'),(23,103,'2013-11-15 20:50:07'),(24,19,'2014-02-07 20:50:07'),(25,33,'2014-01-24 20:50:07'),(26,102,'2013-11-16 20:50:07'),(27,41,'2014-01-16 20:50:07'),(28,94,'2013-11-24 20:50:07'),(29,43,'2014-01-14 20:50:07'),(30,150,'2013-09-29 19:50:07'),(31,218,'2013-07-23 19:50:07'),(32,131,'2013-10-18 19:50:07'),(33,77,'2013-12-11 20:50:07'),(34,2,'2014-02-24 20:50:07'),(35,45,'2014-01-12 20:50:07'),(36,230,'2013-07-11 19:50:07'),(37,101,'2013-11-17 20:50:07'),(38,31,'2014-01-26 20:50:07'),(39,56,'2014-01-01 20:50:07'),(40,176,'2013-09-03 19:50:07'),(41,223,'2013-07-18 19:50:07'),(42,145,'2013-10-04 19:50:07'),(43,26,'2014-01-31 20:50:07'),(44,62,'2013-12-26 20:50:07'),(45,195,'2013-08-15 19:50:07'),(46,153,'2013-09-26 19:50:07'),(47,179,'2013-08-31 19:50:07'),(48,104,'2013-11-14 20:50:07'),(49,7,'2014-02-19 20:50:07'),(50,209,'2013-08-01 19:50:07'),(51,86,'2013-12-02 20:50:07'),(52,110,'2013-11-08 20:50:07'),(53,204,'2013-08-06 19:50:07'),(54,187,'2013-08-23 19:50:07'),(55,114,'2013-11-04 20:50:07'),(56,38,'2014-01-19 20:50:07'),(57,236,'2013-07-05 19:50:07'),(58,79,'2013-12-09 20:50:07'),(59,96,'2013-11-22 20:50:07'),(60,37,'2014-01-20 20:50:07'),(61,207,'2013-08-03 19:50:07'),(62,22,'2014-02-04 20:50:07'),(63,120,'2013-10-29 20:50:07'),(64,200,'2013-08-10 19:50:07'),(65,51,'2014-01-06 20:50:07'),(66,181,'2013-08-29 19:50:07'),(67,4,'2014-02-22 20:50:07'),(68,123,'2013-10-26 19:50:07'),(69,108,'2013-11-10 20:50:07'),(70,55,'2014-01-02 20:50:07'),(71,76,'2013-12-12 20:50:07'),(72,6,'2014-02-20 20:50:07'),(73,18,'2014-02-08 20:50:07'),(74,211,'2013-07-30 19:50:07'),(75,53,'2014-01-04 20:50:07'),(76,216,'2013-07-25 19:50:07'),(77,32,'2014-01-25 20:50:07'),(78,74,'2013-12-14 20:50:07'),(79,138,'2013-10-11 19:50:07'),(80,197,'2013-08-13 19:50:07'),(81,221,'2013-07-20 19:50:07'),(82,118,'2013-10-31 20:50:07'),(83,61,'2013-12-27 20:50:07'),(84,28,'2014-01-29 20:50:07'),(85,16,'2014-02-10 20:50:07'),(86,39,'2014-01-18 20:50:07'),(87,3,'2014-02-23 20:50:07'),(88,46,'2014-01-11 20:50:07'),(89,189,'2013-08-21 19:50:07'),(90,59,'2013-12-29 20:50:07'),(91,249,'2013-06-22 19:50:07'),(92,127,'2013-10-22 19:50:07'),(93,47,'2014-01-10 20:50:07'),(94,178,'2013-09-01 19:50:07'),(95,141,'2013-10-08 19:50:07'),(96,188,'2013-08-22 19:50:07'),(97,220,'2013-07-21 19:50:07'),(98,15,'2014-02-11 20:50:07'),(99,175,'2013-09-04 19:50:07'),(100,206,'2013-08-04 19:50:07');

好的，现在查询了。我想到了三种方法，我省略了MDiesel所做的right join，因为它实际上只是写left join的另一种方式。它是为懒惰的sql开发人员发明的，他们不想切换表名，而只是重写一个单词。

无论如何，首先查询：

select
c.*
from
customers c
left join orders o on c.customerId = o.customerId
where o.customerId is null;

执行计划的结果如下：

+----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key         | key_len | ref              | rows | Extra                    |
+----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
|  1 | SIMPLE      | c     | index | NULL          | PRIMARY     | 4       | NULL             |  250 | Using index              |
|  1 | SIMPLE      | o     | ref   | fk_customer   | fk_customer | 5       | wtf.c.customerId |    1 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+

第二次查询：

select
c.*
from
customers c
where c.customerId not in (select distinct customerId from orders);

执行计划的结果如下：

+----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+
| id | select_type        | table  | type           | possible_keys | key         | key_len | ref  | rows | Extra                    |
+----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+
|  1 | PRIMARY            | c      | index          | NULL          | PRIMARY     | 4       | NULL |  250 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | orders | index_subquery | fk_customer   | fk_customer | 5       | func |    2 | Using index              |
+----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+

第三个查询：

select
c.*
from
customers c
where not exists (select 1 from orders o where o.customerId = c.customerId);

执行计划的结果如下：

+----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
| id | select_type        | table | type  | possible_keys | key         | key_len | ref              | rows | Extra                    |
+----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
|  1 | PRIMARY            | c     | index | NULL          | PRIMARY     | 4       | NULL             |  250 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | o     | ref   | fk_customer   | fk_customer | 5       | wtf.c.customerId |    1 | Using where; Using index |
+----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+

我们可以在所有执行计划中看到，customers表是作为一个整体读取的，但是从索引中读取（隐含的一个作为唯一的列是主键）。当您从表中选择不在索引中的其他列时，这可能会更改。

第一个似乎是最好的。对于客户中的每一行，只读取订单中的一行。 id列表明，MySQL可以一步完成，因为只涉及索引。

第二个查询似乎是最差的（尽管所有3个查询都不应该执行得太糟糕）。对于客户中的每一行，子查询都会执行（select_type列会告诉您）。

第三个查询没有太大区别，因为它使用了一个从属子查询，但应该比第二个查询执行得更好。解释微小的差异将导致现在很远。如果您有兴趣，请参阅手册页，其中说明了每个栏目及其值的含义：EXPLAIN output

最后：我说，第一个查询效果最好，但与往常一样，最终必须衡量，衡量和衡量。

Answer 3

涉及外连接的解决方案将比使用NOT IN的解决方案表现更好。

SQL Query 2表为null结果

3 个答案: