我有一个名为Program的表,其中包含以下列:
ProgDate(Date) Episode(String) Impression_id(int) ProgName(String)
我想找出每个日期和剧集的总展示次数,我有以下查询,这是正常工作
Select progdate, episode, count(distinct impression_id) Impression from Program where progname='BBC' group by progdate, episode order by progdate, episode;
Result: ProgDate Episode Impression 20160919 1 5 20160920 1 15 20160921 1 10 20160922 1 5 20160923 2 25 20160924 2 10 20160925 2 25
但我也想知道每集的累计总数。我尝试搜索如何查找运行总计,但它正在累计所有以前的总计。我希望每集都有总计,如下所示:
Date Episode Impression CumulativeImpressionsPerChannel 20160919 1 5 5 20160920 1 15 20 20160921 1 10 30 20160922 1 5 35 20160923 2 25 25 20160924 2 10 35 20160925 2 25 60
答案 0 :(得分:1)
最新版本的Hive HQL支持窗口分析函数(ref 1)(ref 2),包括SUM()OVER()
假设您有这样的版本,我在SQL Fiddle使用PostgreSQL模仿了语法
CREATE TABLE d
(ProgDate int, Episode int, Impression int)
;
INSERT INTO d
(ProgDate, Episode, Impression)
VALUES
(20160919, 1, 5),
(20160920, 1, 15),
(20160921, 1, 10),
(20160922, 1, 5),
(20160923, 2, 25),
(20160924, 2, 10),
(20160925, 2, 25)
;
查询1 :
select
ProgDate, Episode, Impression
, sum(Impression) over(partition by Episode order by ProgDate) CumImpsPerChannel
, sum(Impression) over(order by ProgDate) CumOverall
from (
Select progdate, episode, count(distinct impression_id) Impression
from Program
where progname='BBC'
group by progdate, episode order by progdate, episode
) d
<强> Results 强>:
| progdate | episode | impression | cumimpsperchannel |
|----------|---------|------------|-------------------|
| 20160919 | 1 | 5 | 5 |
| 20160920 | 1 | 15 | 20 |
| 20160921 | 1 | 10 | 30 |
| 20160922 | 1 | 5 | 35 |
| 20160923 | 2 | 25 | 25 |
| 20160924 | 2 | 10 | 35 |
| 20160925 | 2 | 25 | 60 |