我有一个表格,我想要对其进行简单的字段求和,按两列分组。然后我想要每年year_num的所有值的总和。
参见示例:http://rextester.com/QSLRS68794
此查询抛出:" 42803:column" foo.num_cust"必须出现在GROUP BY子句中或用于聚合函数",我无法弄清楚原因。 为什么使用OVER(PARTITION BY x)的聚合函数要求求和字段在GROUP BY中?
select
year_num
,age_bucket
,sum(num_cust)
--,sum(num_cust) over (partition by year_num) --THROWS ERROR!!
from
foo
group by
year_num
,age_bucket
order by 1,2
表格
| loc_id | year_num | gen | cust_category | cust_age | num_cust | age_bucket |
|--------|-----------|------|----------------|-----------|-----------|-------------|
| 1 | 2016 | M | cash | 41 | 2 | 04_<45 |
| 1 | 2016 | F | Prepaid | 41 | 1 | 03_<35 |
| 1 | 2016 | F | cc | 61 | 1 | 05_45+ |
| 1 | 2016 | F | cc | 19 | 2 | 02_<25 |
| 1 | 2016 | M | cc | 64 | 1 | 05_45+ |
| 1 | 2016 | F | cash | 46 | 1 | 05_45+ |
| 1 | 2016 | F | cash | 27 | 3 | 03_<35 |
| 1 | 2016 | M | cash | 42 | 1 | 04_<45 |
| 1 | 2017 | F | cc | 35 | 1 | 04_<45 |
| 1 | 2017 | F | cc | 37 | 1 | 04_<45 |
| 1 | 2017 | F | cash | 46 | 1 | 05_45+ |
| 1 | 2016 | F | cash | 19 | 4 | 02_<25 |
| 1 | 2017 | M | cash | 43 | 1 | 04_<45 |
| 1 | 2017 | M | cash | 29 | 1 | 03_<35 |
| 1 | 2016 | F | cc | 13 | 1 | 01_<18 |
| 1 | 2017 | F | cash | 16 | 2 | 01_<18 |
| 1 | 2016 | F | cc | 17 | 2 | 01_<18 |
| 1 | 2016 | M | cc | 17 | 2 | 01_<18 |
| 1 | 2017 | F | cash | 18 | 9 | 02_<25 |
期望的输出:
| year_num | age_bucket | sum | sum over (year_num) |
|----------|------------|-----|---------------------|
| 2016 | 01_<18 | 5 | 21 |
| 2016 | 02_<25 | 6 | 21 |
| 2016 | 03_<35 | 4 | 21 |
| 2016 | 04_<45 | 3 | 21 |
| 2016 | 05_45+ | 3 | 21 |
| 2017 | 01_<18 | 2 | 16 |
| 2017 | 02_<25 | 9 | 16 |
| 2017 | 03_<35 | 1 | 16 |
| 2017 | 04_<45 | 3 | 16 |
| 2017 | 05_45+ | 1 | 16 |
答案 0 :(得分:6)
您需要嵌套sum()
s:
select year_num, age_bucket, sum(num_cust),
sum(sum(num_cust)) over (partition by year_num) --WORKS!!
from foo
group by year_num, age_bucket
order by 1, 2;
为什么呢?好吧,窗口函数没有进行聚合。参数需要是一个表达式,可以在 group by
之后进行评估(因为这是一个聚合查询)。由于num_cust
不是group by
密钥,因此需要聚合函数。
如果使用子查询,也许这更清楚:
select year_num, age_bucket, sum_num_cust,
sum(sum_num_cust) over (partition by year_num)
from (select year_num, age_bucket, sum(num_cust) as sum_num_cust
from foo
group by year_num, age_bucket
) ya
order by 1, 2;
这两个查询完全相同。但是对于子查询,为什么需要额外的聚合应该更加明显。