Question

此问题遵循this one。

我现在有了这段代码：

select
    min(purchaseDate) filter (where fruitType = 'apple') as appleFirstPurchaseDate,
    min(purchaseDate) filter (where fruitType = 'orange') as orangeFirstPurchaseDate,
    customer
from fruitPurchases
group by customer

其中给出了以下输出：

appleFirstPurchaseDate  orangeFirstPurchaseDate
----------------------  -----------------------
       2017-05-03              2016-11-25

您还应该知道每件水果都有一个ID。使用此ID，我想创建两个密钥：一个用于将水果ID与appleFirstPurchaseDate连接，另一个用于将水果ID与orangeFirstPurchaseDate

连接起来

因此我尝试了这个：

select
    fruitId || '-' || min(purchaseDate) filter (where fruitType = 'apple') as appleKey,
    min(purchaseDate) filter (where fruitType = 'apple') as apple,
    fruitId || '-' || min(purchaseDate) filter (where fruitType = 'orange') as orangeKey,
    min(purchaseDate) filter (where fruitType = 'orange') as orange,
    customer
from fruitPurchases
group by customer, fruitId

但是当顾客已经买了苹果和橘子时，不幸的是它给了我这个：

  appleKey        appleFirstPurchaseDate    orangeKey    orangeFirstPurchaseDate
  --------        ----------------------    ---------    -----------------------
283-2017-05-03          2017-05-03           [NULL]             [NULL]
   [NULL]                 [NULL]          322-2016-11-25      2016-11-25

虽然我想这样：

  appleKey        appleFirstPurchaseDate    orangeKey    orangeFirstPurchaseDate
  --------        ----------------------    ---------    -----------------------
283-2017-05-03          2017-05-03       322-2016-11-25        2016-11-25

最后一条信息：较早的FirstPurchaseDate并不意味着较低的fruitID。

Answer 1

这是＆＃34;每组最大的组合＆＃34;问题，然后是＆＃34;转动＆＃34;。

一般假设：

所有涉及的列都已定义NOT NULL - 或者排序和连接存在问题。
(customer, fruidType, purchaseDate) 唯一 - 或者您需要定义如何打破关系的规则。
您希望结果只包含两个水果。

您的查询中的基本问题：您只需customer子句中的GROUP BY，因为您希望每个客户一行行。不customer, fruitid，customer和fruitid每个组合生成一行。

但是没有内置的聚合函数可以在一个步骤中检索 来自同一行的fruitID，其中每个purchaseDate也包含最早的(customer, fruidType) 。

你可以通过附加 fruitID（而不是预先添加）来使你的查询工作，因为连接的text仍会排在最早的位置先约会，但这非常丑陋且不必要地缓慢：

SELECT customer , min(purchaseDate || '-' || fruitId) FILTER (WHERE fruitType = 'apple') AS appleKey , min(purchaseDate) FILTER (WHERE fruitType = 'apple') AS apple , min(purchaseDate || '-' || fruitId) FILTER (WHERE fruitType = 'orange') AS orangeKey , min(purchaseDate) FILTER (WHERE fruitType = 'orange') AS orange , customer FROM fruitPurchases GROUP BY customer;

我不会使用来抓住。

有相关的window function first_value() and last_value()，但那些不会聚合。并且您不能使用FILTER子句，它仅用于聚合函数。所以你需要一个额外的查询级别，并且只需使用窗口函数row_number()来标记子查询或CTE中每个组的第一行就更简单了......

@Gordon made it work具有窗口函数的反向帧定义。考虑这个简化，完整和优化的版本：

SELECT DISTINCT ON (customer) customer , first_value(fruitId || '-' || purchaseDate) OVER a AS appleKey , first_value(purchaseDate) OVER a AS appleFirstPurchaseDate , first_value(fruitId || '-' || purchaseDate) OVER o AS orangeKey , first_value(purchaseDate) OVER o AS orangeFirstPurchaseDate FROM (SELECT * FROM fruitPurchases WHERE fruitType IN ('apple', 'orange')) sub WINDOW a AS (PARTITION BY customer ORDER BY fruittype ASC , purchaseDate) , o AS (PARTITION BY customer ORDER BY fruittype DESC, purchaseDate);

但这应该更快：

WITH cte AS ( SELECT DISTINCT ON (customer, fruitType) customer, fruitType, fruitId || '-' || purchaseDate AS key, purchaseDate FROM fruitPurchases WHERE fruitType IN ('apple', 'orange') ORDER BY customer, fruitType, purchaseDate ) SELECT customer , a.key AS appleKey , a.purchaseDate AS appleFirstPurchaseDate , o.key AS orangeKey , a.purchaseDate AS orangeFirstPurchaseDate FROM cte a JOIN cte o USING (customer) WHERE a.fruitType = 'apple' AND o.fruitType = 'orange';

如果您的表允许 仅索引扫描，那么(customer, fruitType, purchaseDate)上的索引或(customer, fruitType, purchaseDate, fruitId)上的索引即可。细节取决于未公开的信息。相关：

How does PostgreSQL perform ORDER BY if a b-tree index is built on that field?

CTE使用DISTINCT ON计算每组最多：

Select first row in each GROUP BY group?

根据实际数据分布，可能会有更快的技术：

Optimize GROUP BY query to retrieve latest record per user

外部SELECT是一种简单的旋转技术。适用于任何数量的水果。

使用CTE中的所述窗函数row_number()：

WITH cte AS ( SELECT customer, fruitType, fruitId || '-' || purchaseDate AS key, purchaseDate , row_number() OVER (PARTITION BY customer, fruitType ORDER BY purchaseDate) AS rn FROM fruitPurchases WHERE fruitType IN ('apple', 'orange') ) SELECT customer , a.key AS appleKey , a.purchaseDate AS appleFirstPurchaseDate , o.key AS orangeKey , a.purchaseDate AS orangeFirstPurchaseDate FROM cte a JOIN cte o USING (customer, rn) WHERE a.rn = 1 AND a.fruitType = 'apple' AND o.fruitType = 'orange';

dbfiddle here

但我会首先认真考虑normalizing your DB design，这会使任务变得更简单，更快。

Answer 2

使用窗口函数可能会更容易找到：

select distinct customer,
       first_value(fruitId || '-' purchaseDate) filter (where fruittype = 'apple') over (partition by customer order by purchaseDate) as appleKey,
       first_value(fruitId || '-' purchaseDate) filter (where fruittype = 'orange') over (partition by customer order by purchaseDate) as orangeKey
from fruitPurchases;

您也可以将其写为：

select distinct customer,
       first_value(case when fruittype = 'apple' then fruitId || '-' || purchaseDate end) over (partition by customer order by (fruittype = 'apple')::int desc, purchaseDate) as appleKey,
       first_value(case when fruittype = 'orange' then fruitId || '-' || purchaseDate end) over (partition by customer order by (fruittype = 'orange')::int desc, purchaseDate) as orangeKey
from fruitPurchases;

Here是一个SQL小提琴。

与FILTER子句连接

2 个答案: