SQL:更新GROUP BY以包括基于另一列的最大值的值

时间:2018-09-07 17:09:47

标签: sql postgresql group-by

问题

在查询中使用GROUP BY语句和聚合函数时,如何从列中添加特定值?

概述

这是我的桌子的一个例子:

id  | year | quarter | wage | comp_id | comp_industry |
123 | 2012 | 1       | 1000 | 456     | abc           |
123 | 2012 | 1       | 2000 | 789     | def           |
123 | 2012 | 2       | 1500 | 789     | def           |
456 | 2012 | 1       | 2000 | 321     | ghi           |
456 | 2012 | 2       | 2000 | 321     | ghi           |

要通过wagequarter计算每个人的wage值的总和,我运行以下查询:

SELECT SUM(wage) AS sum_wage
FROM t1
GROUP BY id, year, quarter, sum_wage;

结果

id  | year | quarter | sum_wage | 
123 | 2012 | 1       | 3000     |
123 | 2012 | 2       | 1500     |
456 | 2012 | 1       | 2000     |
456 | 2012 | 2       | 2000     |

所需的输出

我想更新查询,以包括comp_industry列,其中wagequarter的个人year最高。我不确定从哪里开始,所以我只能返回人们每个quarteryear赚钱最多的行业。

id  | year | quarter | sum_wage | comp_industry
123 | 2012 | 1       | 3000     | def
123 | 2012 | 2       | 1500     | def
456 | 2012 | 1       | 2000     | ghi
456 | 2012 | 2       | 2000     | ghi

我看过Get value based on max of a different column grouped by another columnFetch the row which has the Max value for a column,但不确定从那里去哪里。

任何帮助或建议将不胜感激!

2 个答案:

答案 0 :(得分:1)

您可以尝试将窗口功能SUMROW_NUMBER一起使用。

idyearquarter列的行数按wage desc的顺序进行排序,然后得到rn = 1

模式(PostgreSQL v9.6)

CREATE TABLE T (
   id INT, 
   year INT,
   quarter INT,
   wage INT,
   comp_id INT,
  comp_industry VARCHAR(50)
);


INSERT INTO T VALUES (123 , 2012 , 1 , 1000 , 456    ,'abc');
INSERT INTO T VALUES (123 , 2012 , 1 , 2000 , 789    ,'def');
INSERT INTO T VALUES (123 , 2012 , 2 , 1500 , 789    ,'def');
INSERT INTO T VALUES (456 , 2012 , 1 , 2000 , 321    ,'ghi');
INSERT INTO T VALUES (456 , 2012 , 2 , 2000 , 321    ,'ghi');

查询#1

SELECT id, year,quarter ,sum_wage, comp_industry FROM (
  SELECT *,
           SUM(wage)  OVER (PARTITION BY  id, year, quarter  order by year ) sum_wage,
           ROW_NUMBER() OVER (PARTITION BY  id, year, quarter order by wage desc) rn
    FROM T
) t1
where rn = 1;

| id  | year | quarter | sum_wage | comp_industry |
| --- | ---- | ------- | -------- | ------------- |
| 123 | 2012 | 1       | 3000     | def           |
| 123 | 2012 | 2       | 1500     | def           |
| 456 | 2012 | 1       | 2000     | ghi           |
| 456 | 2012 | 2       | 2000     | ghi           |

View on DB Fiddle

答案 1 :(得分:1)

我不确定100%是否理解这个问题,这对您有帮助吗?

SELECT id, 
       year, 
       quarter, 
       comp_industry, 
       SUM(wage)
  FROM (SELECT id, 
               year, 
               quarter,
               comp_industry, 
               wage
          FROM (SELECT TMP.*,
                       RANK() OVER
                         ( PARTITION BY id, 
                                        year, 
                                        quarter
                               ORDER BY wage_sum DESC         
                         ) wage_rnk
                  FROM (SELECT t1.*,
                               SUM(wage) OVER
                                 ( PARTITION BY id, 
                                                year, 
                                                quarter 
                                 ) wage_sum
                        FROM t1
                        GROUP BY id, 
                                 year, 
                                 quarter
                       ) TMP
               ) TMP2
         WHERE wage_rnk = 1
       ) TMP3
 GROUP  
    BY id, 
       year, 
       quarter, 
       comp_industry;