SQL窗口不包括当前组?

时间:2018-04-09 19:33:25

标签: sql hiveql

我试图提供以下数据的汇总摘要,包括相关论坛以及排除论坛。我认为这可以通过窗口函数来完成,但是我在解决语法方面遇到了问题(在我的案例中是Hive SQL)。

我希望汇总以下数据

+------------+---------+--------+
|    date    | product | rating |
+------------+---------+--------+
| 2018-01-01 | A       | 1      |
| 2018-01-02 | A       | 3      |
| 2018-01-20 | A       | 4      |
| 2018-01-27 | A       | 5      |
| 2018-01-29 | A       | 4      |
| 2018-02-01 | A       | 5      |
| 2017-01-09 | B       | NULL   |
| 2017-01-12 | B       | 3      |
| 2017-01-15 | B       | 4      |
| 2017-01-28 | B       | 4      |
| 2017-07-21 | B       | 2      |
| 2017-09-21 | B       | 5      |
| 2017-09-13 | C       | 3      |
| 2017-09-14 | C       | 4      |
| 2017-09-15 | C       | 5      |
| 2017-09-16 | C       | 5      |
| 2018-04-01 | C       | 2      |
| 2018-01-13 | D       | 1      |
| 2018-01-14 | D       | 2      |
| 2018-01-24 | D       | 3      |
| 2018-01-31 | D       | 4      |
+------------+---------+--------+

汇总结果:

+------+-------+---------+----+------------+------------------+----------+
| year | month | product | ct | avg_rating | avg_rating_other | other_ct |
+------+-------+---------+----+------------+------------------+----------+
| 2018 |     1 | A       |  5 | 3.4        | 2.5              |        4 |
| 2018 |     2 | A       |  1 | 5          | NULL             |        0 |
| 2017 |     1 | B       |  4 | 3.6666667  | NULL             |        0 |
| 2017 |     7 | B       |  1 | 2          | NULL             |        0 |
| 2017 |     9 | B       |  1 | 5          | 4.25             |        4 |
| 2017 |     9 | C       |  4 | 4.25       | 5                |        1 |
| 2018 |     4 | C       |  1 | 2          | NULL             |        0 |
| 2018 |     1 | D       |  4 | 2.5        | 3.4              |        5 |
+------+-------+---------+----+------------+------------------+----------+

我还考虑过制作两个聚合,一个包含有问题的产品,另一个没有,但是在创建合适的连接密钥时遇到了问题。

1 个答案:

答案 0 :(得分:0)

你可以这样做:

select year(date), month(date), product,
       count(*) as ct, avg(rating) as avg_rating,
       sum(count(*)) over (partition by year(date), month(date)) - count(*) as ct_other,
       ((sum(sum(rating)) over (partition by year(date), month(date)) - sum(rating)) /
        (sum(count(*)) over (partition by year(date), month(date)) - count(*))
       ) as avg_other
from t
group by year(date), month(date), product;

"其他"的评级有点棘手。您需要添加所有内容并减去当前行 - 并通过将总和除以计数来计算平均值。