此问题遵循this one。
我现在有了这段代码:
select
min(purchaseDate) filter (where fruitType = 'apple') as appleFirstPurchaseDate,
min(purchaseDate) filter (where fruitType = 'orange') as orangeFirstPurchaseDate,
customer
from fruitPurchases
group by customer
其中给出了以下输出:
appleFirstPurchaseDate orangeFirstPurchaseDate
---------------------- -----------------------
2017-05-03 2016-11-25
您还应该知道每件水果都有一个ID。使用此ID,我想创建两个密钥:一个用于将水果ID与appleFirstPurchaseDate
连接,另一个用于将水果ID与orangeFirstPurchaseDate
因此我尝试了这个:
select
fruitId || '-' || min(purchaseDate) filter (where fruitType = 'apple') as appleKey,
min(purchaseDate) filter (where fruitType = 'apple') as apple,
fruitId || '-' || min(purchaseDate) filter (where fruitType = 'orange') as orangeKey,
min(purchaseDate) filter (where fruitType = 'orange') as orange,
customer
from fruitPurchases
group by customer, fruitId
但是当顾客已经买了苹果和橘子时,不幸的是它给了我这个:
appleKey appleFirstPurchaseDate orangeKey orangeFirstPurchaseDate
-------- ---------------------- --------- -----------------------
283-2017-05-03 2017-05-03 [NULL] [NULL]
[NULL] [NULL] 322-2016-11-25 2016-11-25
虽然我想这样:
appleKey appleFirstPurchaseDate orangeKey orangeFirstPurchaseDate
-------- ---------------------- --------- -----------------------
283-2017-05-03 2017-05-03 322-2016-11-25 2016-11-25
最后一条信息:较早的FirstPurchaseDate
并不意味着较低的fruitID
。
答案 0 :(得分:3)
这是"每组最大的组合"问题,然后是"转动"。
一般假设:
所有涉及的列都已定义NOT NULL
- 或者排序和连接存在问题。
(customer, fruidType, purchaseDate)
唯一 - 或者您需要定义如何打破关系的规则。
您希望结果只包含 两个 水果。
您的查询中的基本问题:您只需customer
子句中的GROUP BY
,因为您希望每个客户一行行。不customer, fruitid
,customer
和fruitid
每个组合生成一行。
但是没有内置的聚合函数可以在一个步骤中检索
来自同一行的fruitID
,其中每个purchaseDate
也包含最早的(customer, fruidType)
。
你 可以通过附加 fruitID
(而不是预先添加)来使你的查询工作,因为连接的text
仍会排在最早的位置先约会,但这非常丑陋且不必要地缓慢:
SELECT customer
, min(purchaseDate || '-' || fruitId) FILTER (WHERE fruitType = 'apple') AS appleKey
, min(purchaseDate) FILTER (WHERE fruitType = 'apple') AS apple
, min(purchaseDate || '-' || fruitId) FILTER (WHERE fruitType = 'orange') AS orangeKey
, min(purchaseDate) FILTER (WHERE fruitType = 'orange') AS orange
, customer
FROM fruitPurchases
GROUP BY customer;
我不会使用 来抓住。
有相关的window function first_value()
and last_value()
,但那些不会聚合。并且您不能使用FILTER
子句,它仅用于聚合函数。所以你需要一个额外的查询级别,并且只需使用窗口函数row_number()
来标记子查询或CTE中每个组的第一行就更简单了......
@Gordon made it work具有窗口函数的反向帧定义。考虑这个简化,完整和优化的版本:
SELECT DISTINCT ON (customer)
customer
, first_value(fruitId || '-' || purchaseDate) OVER a AS appleKey
, first_value(purchaseDate) OVER a AS appleFirstPurchaseDate
, first_value(fruitId || '-' || purchaseDate) OVER o AS orangeKey
, first_value(purchaseDate) OVER o AS orangeFirstPurchaseDate
FROM (SELECT * FROM fruitPurchases WHERE fruitType IN ('apple', 'orange')) sub
WINDOW a AS (PARTITION BY customer ORDER BY fruittype ASC , purchaseDate)
, o AS (PARTITION BY customer ORDER BY fruittype DESC, purchaseDate);
但这应该更快:
WITH cte AS (
SELECT DISTINCT ON (customer, fruitType)
customer, fruitType, fruitId || '-' || purchaseDate AS key, purchaseDate
FROM fruitPurchases
WHERE fruitType IN ('apple', 'orange')
ORDER BY customer, fruitType, purchaseDate
)
SELECT customer
, a.key AS appleKey
, a.purchaseDate AS appleFirstPurchaseDate
, o.key AS orangeKey
, a.purchaseDate AS orangeFirstPurchaseDate
FROM cte a
JOIN cte o USING (customer)
WHERE a.fruitType = 'apple'
AND o.fruitType = 'orange';
如果您的表允许 仅索引扫描,那么(customer, fruitType, purchaseDate)
上的索引或(customer, fruitType, purchaseDate, fruitId)
上的索引 即可。细节取决于未公开的信息。相关:
CTE使用DISTINCT ON
计算每组最多:
根据实际数据分布,可能会有更快的技术:
外部SELECT
是一种简单的旋转技术。适用于任何数量的水果。
使用CTE中的所述窗函数row_number()
:
WITH cte AS (
SELECT customer, fruitType, fruitId || '-' || purchaseDate AS key, purchaseDate
, row_number() OVER (PARTITION BY customer, fruitType ORDER BY purchaseDate) AS rn
FROM fruitPurchases
WHERE fruitType IN ('apple', 'orange')
)
SELECT customer
, a.key AS appleKey
, a.purchaseDate AS appleFirstPurchaseDate
, o.key AS orangeKey
, a.purchaseDate AS orangeFirstPurchaseDate
FROM cte a
JOIN cte o USING (customer, rn)
WHERE a.rn = 1
AND a.fruitType = 'apple'
AND o.fruitType = 'orange';
dbfiddle here
但我会首先认真考虑normalizing your DB design,这会使任务变得更简单,更快。
答案 1 :(得分:1)
使用窗口函数可能会更容易找到:
select distinct customer,
first_value(fruitId || '-' purchaseDate) filter (where fruittype = 'apple') over (partition by customer order by purchaseDate) as appleKey,
first_value(fruitId || '-' purchaseDate) filter (where fruittype = 'orange') over (partition by customer order by purchaseDate) as orangeKey
from fruitPurchases;
您也可以将其写为:
select distinct customer,
first_value(case when fruittype = 'apple' then fruitId || '-' || purchaseDate end) over (partition by customer order by (fruittype = 'apple')::int desc, purchaseDate) as appleKey,
first_value(case when fruittype = 'orange' then fruitId || '-' || purchaseDate end) over (partition by customer order by (fruittype = 'orange')::int desc, purchaseDate) as orangeKey
from fruitPurchases;
Here是一个SQL小提琴。