我早上起步缓慢。我认为有一种更有效的方法来使用连接进行以下查询,而不是两个独立的选择 - 我错了吗?
请注意,为了SO的目的,我已将查询简化/简化为此示例,因此,如果您有任何疑问,请与我联系。
SELECT DISTINCT c.*
FROM customers c
WHERE c.customer_id IN (select customer_id from customers_cars where car_make = 'BMW')
AND c.customer_id IN (select customer_id from customers_cars where car_make = 'Ford')
;
-- Simple tables to demonstrate point
CREATE TABLE customers (
customer_id serial,
name text
);
CREATE TABLE customers_cars (
customer_id integer,
car_make text
);
-- Populate tables
INSERT INTO customers(name) VALUES
('Joe Dirt'),
('Penny Price'),
('Wooten Nagen'),
('Captain Planet')
;
INSERT INTO customers_cars(customer_id,car_make) VALUES
(1,'BMW'),
(1,'Merc'),
(1,'Ford'),
(2,'BMW'),
(2,'BMW'), -- Notice car_make is not unique
(2,'Ferrari'),
(2,'Porche'),
(3,'BMW'),
(3,'Ford');
-- ids 1 and 3 both have BMW and Ford
答案 0 :(得分:2)
另外一个选择,不知道大表上最快的是什么。
SELECT customers.*
FROM customers
JOIN customers_cars USING(customer_id)
WHERE car_make = ANY(ARRAY['BMW','Ford'])
GROUP BY
customer_id, name
HAVING array_agg(car_make) @> ARRAY['BMW','Ford'];
vol7ron: Fiddle
以下是对上述内容的修改,使用数组进行比较时采用相同的想法。我不确定它与双查询方法相比有多高效,因为它必须创建一个数组作为一个通道,然后由于比较数组的元素而做更多的重手比较。
SELECT DISTINCT c.*
FROM customers c
WHERE customer_id IN (
select customer_id
from customers_cars
group by customer_id
having array_agg(car_make) @> ARRAY['BMW','Ford']
);
答案 1 :(得分:1)
我会把它写成
SELECT DISTINCT c.customer_id
FROM customers c
JOIN customers_cars cc_f on c.customer_id = cc_f.customer_id and cc_f.car_make = 'Ford'
JOIN customers_cars cc_b on c.customer_id = cc_b.customer_id and cc_b.car_make = 'BMW'
;
这是否更好我不知道。在一些RDBM中,像这样的普通连接比子查询更好,但我不知道Postgres。从可读性的角度来看,这也是值得怀疑的。
答案 2 :(得分:1)
在我看来,您正在努力寻找至少拥有1辆宝马和至少1辆福特汽车的客户。 这个查询应该适合你:
SELECT
customers.customer_id
FROM
customers
INNER JOIN customer_cars ON
customers.customer_id = customer_cars.customers_id
AND customer_cars.car_make IN ('BMW', 'Ford')
GROUP BY
customers.customer_id
HAVING
COUNT(CASE WHEN car_make = 'BMW' THEN 1 ELSE NULL END) > 0
AND COUNT(CASE WHEN car_make = 'Ford' THEN 1 ELSE NULL END) > 0
确保您在customer_cars.customer_id和customer_cars.car_make上有索引以实现最佳性能。
答案 3 :(得分:0)
您根本不需要加入customers
(给定关系完整性)。
通常,这是关系划分的情况。我们在这个相关问题下汇集了一系列技术:
如果在(customer_id, car_make)
中将customers_cars
定义为唯一,则会更加简单:
SELECT customer_id
FROM customers_cars
WHERE car_make IN ('BMW', 'Ford')
GROUP BY 1
HAVING count(*) = 2;
由于(customer_id, car_make)
不是唯一的,我们需要额外的一步。
对于只有几辆车,您的原始查询并不是那么糟糕。但是(特别是重复!)EXISTS
通常比IN
快,我们不需要最终的DISTINCT
:
SELECT customer_id -- no DISTINCT needed.
FROM customers c
WHERE EXISTS (SELECT 1 FROM customers_cars WHERE customer_id = c.customer_id AND car_make = 'BMW')
AND EXISTS (SELECT 1 FROM customers_cars WHERE customer_id = c.customer_id AND car_make = 'Ford');
对于较长的汽车列表,上述查询会变得冗长且效率较低。对于任意数量的汽车,我建议:
SELECT customer_id
FROM (
SELECT customer_id, car_make
FROM customers_cars
WHERE car_make IN ('BMW', 'Ford')
GROUP BY 1, 2
) sub
GROUP BY 1
HAVING count(*) = 2;