Question

我正在尝试做一个简单的SQL查询：

SELECT DISTINCT id
FROM marketing
WHERE type = 'email'
  AND id NOT IN (
                SELECT id
                FROM marketing
                WHERE type = 'letter'
                )
ORDER BY id;

运行需要很长时间，我认为它与where语句中的select（有大量id）有关，但我无法想出一种方法来改进它。

首先这可能是查询速度如此之慢的原因，其次是关于如何改进它的任何建议？

编辑：

数据库系统：MySql

Id已编入索引，但不是此表中的主键;这是一把外键。

Answer 1

此类型的查询有一种已知模式：获取与另一组不匹配的所有行。

select id from marketing m1
left outer join marketing m2 on m1.id = m2.id and m2.type = 'letter'
where m1.type = 'email' and m2.id IS NULL

这将获得营销中属于“电子邮件”类型的所有行，并且不存在类型为“字母”的ID以匹配。如果您想要另一组，请使用IS NOT NULL。 id列上的正确索引是最大执行速度所需要的，类型为覆盖列。

Answer 2

select distinct id
from   marketing a
where  type = 'email'
and    not exists (
           select 'X'
           from   marketing b
           where  a.id = b.id
           and    type = 'letter' )
order by id

Answer 3

这是您的查询的替代方案，但根据Quassnoi here (MySQL)，它应该执行类似的操作。

   select email.id
     from marketing email
left join marketing letter on letter.type='letter' and letter.id=email.id
    where email.type='email' and letter.id is null
 group by email.id
 order by email.id;

编写此类查询的三种主要方式是NOT IN，NOT EXISTS（相关）或LEFT JOIN / IS NULL。 Quassnoi将它们与MySQL（上面的链接），SQL Server，Oracle和PostgreSQL进行比较。

Answer 4

您还可以将此查询短语作为聚合查询。您要查找的条件是id至少有一行type = 'email'而行没有type = 'letter'：

select id
from marketing m
group by id
having SUM(case when type = 'letter' then 1 else 0 end) = 0 and
       SUM(case when type = 'email' then 1 else 0 end) > 0

marketing(id, type)上的索引可能会更快地运行此查询。 order by id在MySQL中是多余的，因为group by执行排序。

改进查询

4 个答案: