使简单的SQL更有效率

时间:2014-11-05 14:41:53

标签: sql postgresql relational-division

SQL Fiddle.

我早上起步缓慢。我认为有一种更有效的方法来使用连接进行以下查询,而不是两个独立的选择 - 我错了吗?

请注意,为了SO的目的,我已将查询简化/简化为此示例,因此,如果您有任何疑问,请与我联系。

SELECT DISTINCT c.* 
FROM   customers c
WHERE  c.customer_id IN (select customer_id from customers_cars where car_make = 'BMW')
  AND  c.customer_id IN (select customer_id from customers_cars where car_make = 'Ford')
;

样本表模式

-- Simple tables to demonstrate point
CREATE TABLE customers (
  customer_id serial,
  name text
  );

CREATE TABLE customers_cars (
  customer_id integer,
  car_make text
  );


-- Populate tables
INSERT INTO customers(name) VALUES
  ('Joe Dirt'),
  ('Penny Price'),
  ('Wooten Nagen'),
  ('Captain Planet')
;

INSERT INTO customers_cars(customer_id,car_make) VALUES
  (1,'BMW'),
  (1,'Merc'),
  (1,'Ford'),
  (2,'BMW'),
  (2,'BMW'),      -- Notice car_make is not unique
  (2,'Ferrari'),
  (2,'Porche'),
  (3,'BMW'),
  (3,'Ford');
-- ids 1 and 3 both have BMW and Ford

其他期望

  • 数据库中有~20 car_make
  • 每个customer_id通常有1-3个car_make
  • 每个customer_id预计不会超过50个car_make分配(通常为20-30)
  • 查询通常只会查找每个客户2-3个特定的car_make(例如,宝马和福特),但不是10-20

4 个答案:

答案 0 :(得分:2)

另外一个选择,不知道大表上最快的是什么。

SELECT  customers.*
FROM    customers
    JOIN customers_cars USING(customer_id)
WHERE   car_make = ANY(ARRAY['BMW','Ford'])
GROUP BY
    customer_id, name
HAVING  array_agg(car_make) @> ARRAY['BMW','Ford'];

vol7ron: Fiddle

  

以下是对上述内容的修改,使用数组进行比较时采用相同的想法。我不确定它与双查询方法相比有多高效,因为它必须创建一个数组作为一个通道,然后由于比较数组的元素而做更多的重手比较。

SELECT DISTINCT c.* 
FROM   customers c
WHERE  customer_id IN (
  select   customer_id
  from     customers_cars 
  group by customer_id
  having   array_agg(car_make) @> ARRAY['BMW','Ford']
);

答案 1 :(得分:1)

我会把它写成

SELECT DISTINCT c.customer_id 
FROM   customers c
JOIN   customers_cars cc_f on c.customer_id = cc_f.customer_id and cc_f.car_make = 'Ford'
JOIN   customers_cars cc_b on c.customer_id = cc_b.customer_id and cc_b.car_make = 'BMW'
;

这是否更好我不知道。在一些RDBM中,像这样的普通连接比子查询更好,但我不知道Postgres。从可读性的角度来看,这也是值得怀疑的。

答案 2 :(得分:1)

在我看来,您正在努力寻找至少拥有1辆宝马和至少1辆福特汽车的客户。 这个查询应该适合你:

SELECT
     customers.customer_id
FROM
    customers
        INNER JOIN customer_cars ON
            customers.customer_id = customer_cars.customers_id
            AND customer_cars.car_make IN ('BMW', 'Ford')
GROUP BY
    customers.customer_id
HAVING
    COUNT(CASE WHEN car_make = 'BMW' THEN 1 ELSE NULL END) > 0
    AND COUNT(CASE WHEN car_make = 'Ford' THEN 1 ELSE NULL END) > 0

确保您在customer_cars.customer_id和customer_cars.car_make上有索引以实现最佳性能。

答案 3 :(得分:0)

您根本不需要加入customers(给定关系完整性)。

通常,这是关系划分的情况。我们在这个相关问题下汇集了一系列技术:

独特的组合

如果在(customer_id, car_make)中将customers_cars 定义为唯一,则会更加简单:

SELECT customer_id
FROM   customers_cars
WHERE  car_make IN ('BMW', 'Ford')
GROUP  BY 1
HAVING count(*) = 2;

组合不是唯一的

由于(customer_id, car_make)不是唯一的,我们需要额外的一步。

对于只有几辆车,您的原始查询并不是那么糟糕。但是(特别是重复!)EXISTS通常比IN快,我们不需要最终的DISTINCT

SELECT customer_id -- no DISTINCT needed.
FROM   customers c
WHERE  EXISTS (SELECT 1 FROM customers_cars WHERE customer_id = c.customer_id AND car_make = 'BMW')
AND    EXISTS (SELECT 1 FROM customers_cars WHERE customer_id = c.customer_id AND car_make = 'Ford');

对于较长的汽车列表,上述查询会变得冗长且效率较低。对于任意数量的汽车,我建议:

SELECT customer_id
FROM  (
   SELECT customer_id, car_make
   FROM   customers_cars
   WHERE  car_make IN ('BMW', 'Ford')
   GROUP  BY 1, 2
   ) sub
GROUP  BY 1
HAVING count(*) = 2;

SQL Fiddle.