我在表my_table
中有这些数据:
person_id datetime
1 2017-03-02 18:06:20
1 2017-03-02 18:05:10
1 2017-04-01 18:04:09
1 2017-03-02 19:06:50
1 2017-04-01 19:07:22
2 2017-03-03 18:09:15
2 2017-05-03 19:07:05
2 2017-05-03 20:19:08
我需要计算每小时的人数(非独特)。问题是我应该得到平均数(平均数天)。
想象一下,今天18:00到19:00之间有10位访客,而昨天在同一时间段内有5位访客。那么,这两天的平均访客人数是多少? (10 + 5)/ 2 = 15/2 = 7.5
我期待这个结果:
person_id HOUR HOURLY_AVG_COUNT
1 18 1.5
1 19 1
1 20 0
2 18 1
2 19 1
2 20 1
我在Hive中编写了以下查询,但它计算了所有日内每小时的总人数:
SELECT person_id, HOUR(datetime), count(*)
FROM my_table
GROUP BY person_id, HOUR(datetime)
ORDER BY person_id
答案 0 :(得分:2)
select person_id
,hour
,avg (hourly_cnt) as hourly_avg_count
from (select person_id
,hour (datetime) as hour
,count(*) as hourly_cnt
from my_table
group by person_id
,hour (datetime)
,date (datetime)
) t
group by person_id
,hour
order by person_id
,hour
;
+-----------+------+------------------+
| person_id | hour | hourly_avg_count |
+-----------+------+------------------+
| 1 | 18 | 1.5 |
| 1 | 19 | 1 |
| 2 | 18 | 1 |
| 2 | 19 | 1 |
| 2 | 20 | 1 |
+-----------+------+------------------+
答案 1 :(得分:1)
如果我理解正确,您可以使用count(distinct)
来获得平均值:
SELECT person_id, HOUR(datetime),
COUNT(*) / COUNT(DISTINCT DATE(datetime))
FROM my_table
GROUP BY person_id, HOUR(datetime)
ORDER BY person_id;
注意:这不计算没有值的天数。你的问题没有解释在这种情况下该怎么做。