我是SQL初学者。我在如何找到每个类别中的前3个最大值时遇到麻烦。问题是
“对于2006年1月的order_id,每个category_id的收入最高的3个product_id是多少?”
Table A:
(Column name)
customer_id
order_id
order_date
revenue
product_id
Table B:
product_id
category_id
我尝试使用内部联接合并表B和A,并按order_date进行过滤。但是后来我被困在如何在每个category_id中找到前3个最大值。 谢谢。
SELECT B.product_id, category_id FROM A
JOIN B ON B.product_id = A.product_id
WHERE order_date BETWEEN ‘2006-01-01’ AND ‘2006-01-31’
ORDER BY revenue DESC
LIMIT 3;
答案 0 :(得分:2)
通常使用window functions
解决此类查询select *
from (
SELECT b.product_id,
b.category_id,
a.revenue,
dense_rank() over (partition by b.category_id, b.product_id order by a.revenue desc) as rnk
from A
join b ON B.product_id = A.product_id
where a.order_date between date '2006-01-01' AND date '2006-01-31'
) as t
where rnk <= 3
order by product_id, category_id, revenue desc;
dense_rank()
还将处理领带(在相同类别中具有相同收入的产品),因此每个产品/类别实际上可能获得3行以上。
如果同一产品在表b
中(针对同一类别)可以多次显示,则需要将其与GROUP BY结合使用以获取所有收入的总和:
select *
from (
SELECT b.product_id,
b.category_id,
sum(a.revenue) as total_revenue,
dense_rank() over (partition by b.category_id, a.product_id order by sum(a.revenue) desc) as rnk
from a
join b on B.product_id = A.product_id
where a.order_date between date '2006-01-01' AND date '2006-01-31'
group by b.product_id, b.category_id
) as t
where rnk <= 3
order by product_id, category_id, total_revenue desc;
将窗口功能和GROUP BY结合使用时,窗口功能将在GROUP BY之后 应用。
答案 1 :(得分:1)
您可以使用窗口函数来收集分组的收入,然后在外部查询中提取最后一个X。我没有在PostgreSQL中工作过,所以下面可能缺少快捷方式功能。
WITH ByRevenue AS
(
--This creates a virtualized table that can be queried similar to a physical table in the conjoined statements below
SELECT
category_id,
product_id,
MAX(revenue) as max_revenue
FROM
A
JOIN B ON B.product_id = A.product_id
WHERE
order_date BETWEEN ‘2018-01-01’ AND ‘2018-01-31’
GROUP BY
category_id,product_id
)
,Normalized
(
--Pull data from the in memory table above using normal sql syntax and normalize it with a RANK function to achieve the limit.
SELECT
category_id,
product_id,
max_revenue,
ROW_NUMBER() OVER (PARTITION BY category_id,product_id ORDER BY max_revenue DESC) as rn
FROM
ByRevenue
)
--Final query from stuff above with each category/product ranked by revenue
SELECT *
FROM Normalized
WHERE RN<=3;
答案 2 :(得分:0)
尝试仅使用访存n行吗?
注意:假设您的主键是product_id
,所以我用它们来组合两个表。
SELECT A.category,A.revenue From Table A
INNER JOIN Table B on A.product_id = B.Product_ID
WHERE A.Order_Date between (from date) and (to date)
ORDER BY A.Revenue DESC
Fetch first 3 rows only
答案 3 :(得分:0)
对于前n个查询,首先要尝试的是横向连接:
WITH categories as (
SELECT DISTINCT category_id
FROM B
)
SELECT categories.category_id, sub.product_id
FROM categories
JOIN LATERAL (
SELECT a.product_id
FROM B
JOIN A ON (a.product_id = b.product_id)
WHERE b.category_id = categories.category_id
AND order_date BETWEEN '2006-01-01' AND '2006-01-31'
GROUP BY a.product_id
ORDER BY sum(revenue) desc
LIMIT 3
) sub on true;