如何在PostgreSQL中找到每个类别中的三个最大值?

时间:2019-06-18 00:44:45

标签: sql postgresql greatest-n-per-group

我是SQL初学者。我在如何找到每个类别中的前3个最大值时遇到麻烦。问题是

  

“对于2006年1月的order_id,每个category_id的收入最高的3个product_id是多少?”

Table A:                    
(Column name)         
customer_id            
order_id              
order_date   
revenue  
product_id

Table B:  
product_id  
category_id

我尝试使用内部联接合并表B和A,并按order_date进行过滤。但是后来我被困在如何在每个category_id中找到前3个最大值。 谢谢。

到目前为止,我能想到的

SELECT B.product_id, category_id FROM A

JOIN B ON B.product_id = A.product_id

WHERE order_date BETWEEN ‘2006-01-01’ AND ‘2006-01-31’

ORDER BY revenue DESC

LIMIT 3;

4 个答案:

答案 0 :(得分:2)

通常使用window functions

解决此类查询
select *
from (
  SELECT b.product_id, 
         b.category_id,
         a.revenue,
         dense_rank() over (partition by b.category_id, b.product_id order by a.revenue desc) as rnk
  from A
    join b ON B.product_id = A.product_id
  where a.order_date between date '2006-01-01' AND date '2006-01-31'
) as t
where rnk <= 3
order by product_id, category_id, revenue desc;

dense_rank()还将处理领带(在相同类别中具有相同收入的产品),因此每个产品/类别实际上可能获得3行以上。

如果同一产品在表b中(针对同一类别)可以多次显示,则需要将其与GROUP BY结合使用以获取所有收入的总和:

select *
from (
  SELECT b.product_id, 
         b.category_id,
         sum(a.revenue) as total_revenue,
         dense_rank() over (partition by b.category_id, a.product_id order by sum(a.revenue) desc) as rnk
  from a
    join b on B.product_id = A.product_id
  where a.order_date between date '2006-01-01' AND date '2006-01-31'
  group by b.product_id, b.category_id
) as t
where rnk <= 3
order by product_id, category_id, total_revenue desc;

将窗口功能和G​​ROUP BY结合使用时,窗口功能将在GROUP BY之后 应用。

答案 1 :(得分:1)

您可以使用窗口函数来收集分组的收入,然后在外部查询中提取最后一个X。我没有在PostgreSQL中工作过,所以下面可能缺少快捷方式功能。

WITH ByRevenue AS
(
    --This creates a virtualized table that can be queried similar to a physical table in the conjoined statements below 
    SELECT
        category_id,
        product_id,
        MAX(revenue) as max_revenue 
    FROM 
        A
        JOIN B ON B.product_id = A.product_id
    WHERE 
        order_date BETWEEN ‘2018-01-01’ AND ‘2018-01-31’
    GROUP BY
        category_id,product_id
)
,Normalized
(
    --Pull data from the in memory table above using normal sql syntax and normalize it with a RANK function to achieve the limit.
    SELECT
        category_id,
        product_id,
        max_revenue,
        ROW_NUMBER() OVER (PARTITION BY category_id,product_id ORDER BY max_revenue DESC) as rn
    FROM
        ByRevenue
)
--Final query from stuff above with each category/product ranked by revenue
SELECT * 
FROM Normalized 
WHERE RN<=3;

答案 2 :(得分:0)

尝试仅使用访存n行吗?

注意:假设您的主键是product_id,所以我用它们来组合两个表。

SELECT A.category,A.revenue From Table A 
INNER JOIN Table B on A.product_id = B.Product_ID 
WHERE A.Order_Date between (from date) and (to date)
ORDER BY A.Revenue DESC
Fetch first 3 rows only

答案 3 :(得分:0)

对于前n个查询,首先要尝试的是横向连接:

WITH categories as (
    SELECT DISTINCT category_id
    FROM B
)
SELECT categories.category_id, sub.product_id
FROM categories
JOIN LATERAL (
    SELECT a.product_id
    FROM B
    JOIN A ON (a.product_id = b.product_id)
    WHERE b.category_id = categories.category_id
      AND order_date BETWEEN '2006-01-01' AND '2006-01-31'
    GROUP BY a.product_id
    ORDER BY sum(revenue) desc
    LIMIT 3
) sub on true;