Question

我有一个像

这样的查询

select count(distinct tab1.id)
from tab1 join tab2 on tab1.email = tab2.email_a

但是当我将其更改为

时

select count(distinct tab1.id)
from tab1 join tab2 on tab1.email = tab2.email_a or tab1.email = tab2.email_b

然后突然间，它以某种方式非常低效。我知道我可以使用两个连接语句编写查询，但是Vertica在第二个语句中究竟做了什么让它变得如此可怕？

Answer 1

实际上，无论DBMS是什么，我实际上都期望OR谓词表现更差：

优化的JOIN操作 - 至少通常 - 依赖于可以支持此连接的物理设计（其他数据库中的索引，Vertica中的投影设计） - 至少部分是这样。

但是一旦你在比较之前在任何一个连接函数上应用任何表达式，这就会在窗口中消失 - 这包括CAST，函数，数学运算，以及就此而言的逻辑运算，如OR。

到目前为止，我还没有找到在应用比较之前对连接操作数进行操作的任何情况，其中混淆优化器以选择更糟糕的计划的风险并不太高。

因此，我希望优化者采取不太理想的计划....

@Hanmyo - 你能找到一种方法来运行你想要的查询的解释 - 一次，在谓词中没有OR，所以我们可以得到说明的差异吗？

干杯 - 马可

Answer 2

or是性能杀手。

这是如何运作的？

select count(tab1.id)
from tab1 
where exists (select 1 from tab2 where tab1.email = tab2.email_a) or 
      exists (select 1 from tab2 where tab1.email = tab2.email_b);

我猜测tab1.id是唯一的，因此您不需要select distinct。

Vertica SQL：＆＃34;或＆＃34;在加入导致巨大的放缓？

2 个答案: