Question

不知何故，这看起来效率不高。这可以优化效率更高吗？

SELECT DISTINCT p.col1 from table1 p where p.col1 not in
(SELECT DISTINCT o.col1 from table1 o where o.col2 = 'ABC')

例如，选择所有没有product = soap

的超市

Answer 1

您希望所有col1值col2永远不会'ABC'。您可以使用聚合来处理此问题：

select p.col1
from table1 p
group by p.col1
having sum(case when p.col2 = 'ABC' then 1 else 0 end) = 0;

为什么这会更快？好吧，有些情况下它不会成功。但它经常会。无论如何，select distinct正在进行聚合。因此，使用join或in的其他方法正在添加额外的工作。现在，如果它们显着减少了正在处理的数据量，这项额外的工作是值得的。

此外，not in在语义上是危险的。如果col1的任何值NULL col2 = 'ABC'，那么所有数据都将被过滤掉。也就是说，查询将根本不返回任何行。这可以加速很多！此公式假定col1在这种情况下永远不会NULL。

最后，如果您有一个已经唯一的col1值列表，那么最快的方法可能是：

select c.col1
from col1table c
where not exists (select 1 from table1 o where o.col1 = c.col1 and o.col2 = 'ABC')

对于此查询，table1(col1, col2)上的索引是性能最佳的。

Answer 2

您是否尝试使用not子句进行查询？

即。从table1中选择不同的col1，其中col2＆lt;＆gt; 'ABC'

Answer 3

我会按照以下方式构建：

select supermarkets.*
from   supermarkets
where  not exists (
         select 1
         from   product_in_supermarkets
         where  product_in_supermarkets.supermarket_id = supermarkets.id and
                product_in_supermarkets.product_type = 'soap')

索引：

product_in_supermarkets(supermarket_id, product_type)

以获得最佳表现。

现在已经说过，可能是在正确的情况下，NOT EXISTS和NOT IN查询变换为相同，并且将执行反连接。从语义上讲，我喜欢不存在的相关子查询，因为我认为它更能代表查询的意图。

如果子查询的投影中存在空值，则NOT IN也容易受到意外影响，因为没有值可以说不在包含NULL的列表中（包括NULL）。

Answer 4

我认为您应该考虑在col1上创建索引。

我也尝试使用

select distinct p.col1 from table1 p where not exists
(select distinct o.col1 from table1 o where o.col1 = p.col1 and o.col2 = 'ABC');

此外，根据行数和数据熵，有时避免与内部查询的区别可能是一个有用的权衡。

编写这个oracle sql查询的更好方法是什么？

4 个答案: