Question

我有2个表customer和coupons，客户可能会或可能不会分配reward_id，因此它是可以为空的列。客户可以拥有许多优惠券和优惠券属于客户。

+-------------+------------+
|   coupons   | customers  |
+-------------+------------+
| id          | id         |
| customer_id | first_name |
| code        | reward_id  |
+-------------+------------+
customer_id column is indexed

我想在两个表之间建立联接。

我的尝试是：

select c.*, cust.id as cust_id, cust.first_name as cust_name
from coupons c
join customer cust
on c.customer_id = cust.id and cust.reward_id is not null

但是，我认为没有关于reward_id的索引，所以我应该在cust.reward_id is not null条款中移动where：

select c.*, cust.id as cust_id, cust.first_name as cust_name
from coupons c
join customer cust
on c.customer_id = cust.id
where cust.reward_id is not null

我想知道第二次尝试是否比第一次尝试更有效。

Answer 1

如果你自己看到执行计划会更好。在select语句之前添加EXPLAIN ANALYZE并执行两者以查看差异。

以下是：

EXPLAIN ANALYZE select ...

它做什么？它实际上执行select语句并返回查询优化器选择的执行计划。如果没有ANALYZE关键字，它只会估计执行计划，而不会在后台实际执行该语句。

数据库不会同时使用两个索引，因此在customer(id)上设置索引会使其无法在customer(reward_id)上使用索引。这种情况实际上将被视为一种正确行为的过滤条件。

您可以尝试使用这样创建的部分索引的效果：customer(id) where reward_id is not null。这会减少索引大小，因为它只会存储已分配reward_id的客户ID。

我通常喜欢从应用的条件中拆分关系/连接逻辑，我自己将它们放在WHERE子句中，因为它在那里更明显，如果有更多更改，将来更容易阅读。

我建议您自己查看可能的性能提升，因为它取决于有多少数据以及reward_id可能的低基数。例如，如果大多数行都将此列填充了一个值，那么由于索引大小（正常与部分）几乎相同，因此不会产生太大差异。

Answer 2

在PostgreSQL内部联接中，是否将过滤条件放在ON子句或WHERE子句中都不会影响查询结果或性能。

以下是更详细地探讨该主题的指南：https://app.pluralsight.com/guides/using-on-versus-where-clauses-to-combine-and-filter-data-in-postgresql-joins

连接表时PostgreSQL ON vs WHERE？

2 个答案: