Question

我有以下声明在我的数据中找到明确的名称（约1百万个条目）：

select Prename, Surname from person p1 
where Prename is not null and Surname is not null 
and not exists (
   select * from person p2 where (p1.Surname = p2.Surname OR p1.Surname = p2.Altname) 
   and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and inv_date IS NULL

Oracle显示1477315000的巨额成本，并且执行不会在5分钟后结束。简单地将OR分成一个自己存在的子条款可以将性能提升到0.5秒，成本提高到45000：

select Prename, Surname from person p1 
where Prename is not null and Surname is not null 
and not exists (
   select * from person p2 where p1.Surname = p2.Surname and
   p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and not exists (
   select * from person p2 where p1.Surname = p2.Altname and 
   p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and inv_date IS NULL

将问题调整到最好是不是我的问题，因为它只是一个很少执行的查询，而且我知道CONTACT超过任何索引，但我只是想知道这个高成本来自哪里。两个查询在语义上都与我相当。

Answer 1

答案在EXPLAIN PLAN中供您查询。它们在语义上可能是等价的，但是你的查询幕后的执行计划却截然不同。

EXISTS与JOIN的操作方式不同，实质上，您的OR过滤器语句是将表连接在一起的。

第二个查询中没有JOIN，因为您只从一个表中检索记录。

Answer 2

两个查询的结果可能在语义上等效，但执行在操作上并不等效。您的第二个示例永远不会使用OR运算符来组合谓词。第二个示例中的所有谓词都使用AND进行组合。

性能更好，因为如果与AND组合的第一个谓词不评估为true，则跳过第二个（或任何其他谓词）（未评估）。如果您使用了OR，则必须经常评估两个（或所有）谓词，从而减慢查询速度。（检查ORed谓词，直到一个计算结果为真。）

Answer 3

我会考虑测试下面重写的查询...按照“限定”被认为是匹配的标准，从一个到另一个直接连接...然后，在WHERE子句中，如果在它没有提出匹配

select 
      p1.Prename, 
      p1.Surname
   from 
      person p1 
         join person p2
            on p1.ID <> p2.ID
            and (  p1.Surname = p2.Surname
                or p1.SurName = p2.AltName )
            and p2.PreName like concat( concat( '%', p1.Prename ), '%' )
   where
          p1.PreName is not null
      and p1.SurName is not null
      and p1.Inv_date is null
      and p2.id is null

根据你的意见，但从你看来你正在寻找...不，不要做左外连接...如果你正在寻找你想要淘汰的ALIKE的名字（不过你' ll处理那个），你只想通过自联接（因此正常连接）预先确定那些有匹配的记录。如果你的名字没有相似的名字，你可能想要不管它...因此它将自动留在结果集之外。

现在，WHERE子句开始了...你左边有一个有效的人......右边有一个人..这些是重复的...所以你有匹配，现在投入逻辑“p2.ID IS NULL”创建与NOT EXIST相同的结果，给出最终结果。

我将查询恢复为正常的“加入”。

为什么SQL成本会以简单的“或”爆炸？

3 个答案: