与FILTER子句连接

时间:2017-11-14 21:20:28

标签: sql postgresql pivot greatest-n-per-group

此问题遵循this one

我现在有了这段代码:

select
    min(purchaseDate) filter (where fruitType = 'apple') as appleFirstPurchaseDate,
    min(purchaseDate) filter (where fruitType = 'orange') as orangeFirstPurchaseDate,
    customer
from fruitPurchases
group by customer

其中给出了以下输出:

appleFirstPurchaseDate  orangeFirstPurchaseDate
----------------------  -----------------------
       2017-05-03              2016-11-25

您还应该知道每件水果都有一个ID。使用此ID,我想创建两个密钥:一个用于将水果ID与appleFirstPurchaseDate连接,另一个用于将水果ID与orangeFirstPurchaseDate

连接起来

因此我尝试了这个:

select
    fruitId || '-' || min(purchaseDate) filter (where fruitType = 'apple') as appleKey,
    min(purchaseDate) filter (where fruitType = 'apple') as apple,
    fruitId || '-' || min(purchaseDate) filter (where fruitType = 'orange') as orangeKey,
    min(purchaseDate) filter (where fruitType = 'orange') as orange,
    customer
from fruitPurchases
group by customer, fruitId

但是当顾客已经买了苹果和橘子时,不幸的是它给了我这个:

  appleKey        appleFirstPurchaseDate    orangeKey    orangeFirstPurchaseDate
  --------        ----------------------    ---------    -----------------------
283-2017-05-03          2017-05-03           [NULL]             [NULL]
   [NULL]                 [NULL]          322-2016-11-25      2016-11-25

虽然我想这样:

  appleKey        appleFirstPurchaseDate    orangeKey    orangeFirstPurchaseDate
  --------        ----------------------    ---------    -----------------------
283-2017-05-03          2017-05-03       322-2016-11-25        2016-11-25

最后一条信息:较早的FirstPurchaseDate并不意味着较低的fruitID

2 个答案:

答案 0 :(得分:3)

这是"每组最大的组合"问题,然后是"转动"。

一般假设:

  • 所有涉及的列都已定义NOT NULL - 或者排序和连接存在问题。

  • (customer, fruidType, purchaseDate) 唯一 - 或者您需要定义如何打破关系的规则。

  • 您希望结果只包含 两个 水果。

您的查询中的基本问题:您只需customer子句中的GROUP BY,因为您希望每个客户一行行。不customer, fruitidcustomerfruitid每个组合生成一行

但是没有内置的聚合函数可以在一个步骤中检索 来自同一行的fruitID,其中每个purchaseDate也包含最早的(customer, fruidType)

可以通过附加 fruitID(而不是预先添加)来使你的查询工作,因为连接的text仍会排在最早的位置先约会,但这非常丑陋且不必要地缓慢:

SELECT customer
     , min(purchaseDate || '-' || fruitId) FILTER (WHERE fruitType = 'apple')  AS appleKey
                       , min(purchaseDate) FILTER (WHERE fruitType = 'apple')  AS apple
     , min(purchaseDate || '-' || fruitId) FILTER (WHERE fruitType = 'orange') AS orangeKey
                       , min(purchaseDate) FILTER (WHERE fruitType = 'orange') AS orange
     , customer
FROM   fruitPurchases
GROUP  BY customer;

我不会使用 来抓住。

有相关的window function first_value() and last_value(),但那些不会聚合。并且您不能使用FILTER子句,它仅用于聚合函数。所以你需要一个额外的查询级别,并且只需使用窗口函数row_number()来标记子查询或CTE中每个组的第一行就更简单了......

@Gordon made it work具有窗口函数的反向帧定义。考虑这个简化,完整和优化的版本:

SELECT DISTINCT ON (customer)
       customer
     , first_value(fruitId || '-' || purchaseDate) OVER a AS appleKey
     , first_value(purchaseDate)                   OVER a AS appleFirstPurchaseDate
     , first_value(fruitId || '-' || purchaseDate) OVER o AS orangeKey
     , first_value(purchaseDate)                   OVER o AS orangeFirstPurchaseDate
FROM  (SELECT * FROM fruitPurchases WHERE fruitType IN ('apple', 'orange')) sub
WINDOW a AS (PARTITION BY customer ORDER BY fruittype ASC , purchaseDate)
     , o AS (PARTITION BY customer ORDER BY fruittype DESC, purchaseDate);

但这应该更快

WITH cte AS (
   SELECT DISTINCT ON (customer, fruitType)
          customer, fruitType, fruitId || '-' || purchaseDate AS key, purchaseDate
   FROM   fruitPurchases
   WHERE  fruitType IN ('apple', 'orange')
   ORDER  BY customer, fruitType, purchaseDate
   )
SELECT customer
     , a.key          AS appleKey
     , a.purchaseDate AS appleFirstPurchaseDate
     , o.key          AS orangeKey
     , a.purchaseDate AS orangeFirstPurchaseDate
FROM   cte a
JOIN   cte o USING (customer)
WHERE  a.fruitType = 'apple'
AND    o.fruitType = 'orange';

如果您的表允许 仅索引扫描,那么(customer, fruitType, purchaseDate)上的索引(customer, fruitType, purchaseDate, fruitId)上的索引 即可。细节取决于未公开的信息。相关:

CTE使用DISTINCT ON计算每组最多

根据实际数据分布,可能会有更快的技术:

外部SELECT是一种简单的旋转技术。适用于任何数量的水果。

使用CTE中的所述窗函数row_number()

WITH cte AS (
   SELECT customer, fruitType, fruitId || '-' || purchaseDate AS key, purchaseDate
        , row_number() OVER (PARTITION BY customer, fruitType ORDER BY purchaseDate) AS rn
   FROM   fruitPurchases
   WHERE  fruitType IN ('apple', 'orange')
   )
SELECT customer
     , a.key          AS appleKey
     , a.purchaseDate AS appleFirstPurchaseDate
     , o.key          AS orangeKey
     , a.purchaseDate AS orangeFirstPurchaseDate
FROM   cte a
JOIN   cte o USING (customer, rn)
WHERE  a.rn = 1
AND    a.fruitType = 'apple'
AND    o.fruitType = 'orange';

dbfiddle here

但我会首先认真考虑normalizing your DB design,这会使任务变得更简单,更快。

答案 1 :(得分:1)

使用窗口函数可能会更容易找到:

select distinct customer,
       first_value(fruitId || '-' purchaseDate) filter (where fruittype = 'apple') over (partition by customer order by purchaseDate) as appleKey,
       first_value(fruitId || '-' purchaseDate) filter (where fruittype = 'orange') over (partition by customer order by purchaseDate) as orangeKey
from fruitPurchases;

您也可以将其写为:

select distinct customer,
       first_value(case when fruittype = 'apple' then fruitId || '-' || purchaseDate end) over (partition by customer order by (fruittype = 'apple')::int desc, purchaseDate) as appleKey,
       first_value(case when fruittype = 'orange' then fruitId || '-' || purchaseDate end) over (partition by customer order by (fruittype = 'orange')::int desc, purchaseDate) as orangeKey
from fruitPurchases;

Here是一个SQL小提琴。