为什么在使用OVER(PARTITION BY x)时需要在GROUP BY中包含一个字段?

时间:2017-10-06 19:24:09

标签: sql postgresql aggregate-functions window-functions

我有一个表格,我想要对其进行简单的字段求和,按两列分组。然后我想要每年year_num的所有值的总和。

参见示例:http://rextester.com/QSLRS68794

此查询抛出:" 42803:column" foo.num_cust"必须出现在GROUP BY子句中或用于聚合函数",我无法弄清楚原因。 为什么使用OVER(PARTITION BY x)的聚合函数要求求和字段在GROUP BY中?

select 
    year_num
    ,age_bucket
    ,sum(num_cust)
    --,sum(num_cust) over (partition by year_num)  --THROWS ERROR!!
from
    foo
group by
    year_num
    ,age_bucket
order by 1,2

表格

| loc_id |  year_num |  gen |  cust_category |  cust_age |  num_cust |  age_bucket |
|--------|-----------|------|----------------|-----------|-----------|-------------|
| 1      | 2016      | M    | cash           | 41        | 2         | 04_<45      |
| 1      | 2016      | F    | Prepaid        | 41        | 1         | 03_<35      |
| 1      | 2016      | F    | cc             | 61        | 1         | 05_45+      |
| 1      | 2016      | F    | cc             | 19        | 2         | 02_<25      |
| 1      | 2016      | M    | cc             | 64        | 1         | 05_45+      |
| 1      | 2016      | F    | cash           | 46        | 1         | 05_45+      |
| 1      | 2016      | F    | cash           | 27        | 3         | 03_<35      |
| 1      | 2016      | M    | cash           | 42        | 1         | 04_<45      |
| 1      | 2017      | F    | cc             | 35        | 1         | 04_<45      |
| 1      | 2017      | F    | cc             | 37        | 1         | 04_<45      |
| 1      | 2017      | F    | cash           | 46        | 1         | 05_45+      |
| 1      | 2016      | F    | cash           | 19        | 4         | 02_<25      |
| 1      | 2017      | M    | cash           | 43        | 1         | 04_<45      |
| 1      | 2017      | M    | cash           | 29        | 1         | 03_<35      |
| 1      | 2016      | F    | cc             | 13        | 1         | 01_<18      |
| 1      | 2017      | F    | cash           | 16        | 2         | 01_<18      |
| 1      | 2016      | F    | cc             | 17        | 2         | 01_<18      |
| 1      | 2016      | M    | cc             | 17        | 2         | 01_<18      |
| 1      | 2017      | F    | cash           | 18        | 9         | 02_<25      |

期望的输出:

| year_num | age_bucket | sum | sum over (year_num) |
|----------|------------|-----|---------------------|
| 2016     | 01_<18     | 5   | 21                  |
| 2016     | 02_<25     | 6   | 21                  |
| 2016     | 03_<35     | 4   | 21                  |
| 2016     | 04_<45     | 3   | 21                  |
| 2016     | 05_45+     | 3   | 21                  |
| 2017     | 01_<18     | 2   | 16                  |
| 2017     | 02_<25     | 9   | 16                  |
| 2017     | 03_<35     | 1   | 16                  |
| 2017     | 04_<45     | 3   | 16                  |
| 2017     | 05_45+     | 1   | 16                  |

1 个答案:

答案 0 :(得分:6)

您需要嵌套sum() s:

select year_num, age_bucket, sum(num_cust),
       sum(sum(num_cust)) over (partition by year_num)  --WORKS!!
from foo
group by year_num, age_bucket
order by 1, 2;

为什么呢?好吧,窗口函数没有进行聚合。参数需要是一个表达式,可以在 group by之后进行评估(因为这是一个聚合查询)。由于num_cust不是group by密钥,因此需要聚合函数。

如果使用子查询,也许这更清楚:

select year_num, age_bucket, sum_num_cust,
       sum(sum_num_cust) over (partition by year_num)
from (select year_num, age_bucket, sum(num_cust) as sum_num_cust
      from foo
      group by year_num, age_bucket
     ) ya
order by 1, 2;

这两个查询完全相同。但是对于子查询,为什么需要额外的聚合应该更加明显。