我有这张表my_table
:
recorder_id person_id day
A1 1 2017-06-03 12:30
A1 1 2017-06-03 12:45
B1 1 2017-06-03 12:50
A1 2 2017-06-03 16:40
B1 2 2017-06-03 16:45
B1 2 2017-06-03 18:20
A1 1 2017-06-04 11:22
我想知道每个人平均每天经过多少次。例如,身份1的人平均每天经过记录器A1 1.5次,而2人平均每天经过0.5次记录(因为此人没有2017-06-04的记录)。应该对B1应用相同的逻辑。
recorder_id person_id daily_average_per_person
A1 1 1.5
A1 2 0.5
B1 1 0.5
B1 2 1.0
我怎样才能得到这个结果?
我尝试了这个查询,但我不知道如何计算每位唯一身份的每日平均值:
SELECT recorder_id, person_id,
to_date(day) as hour,
count(*) as hourly_count
FROM my_table
GROUP BY recorder_id, person_id, to_date(day)
ORDER BY hourly_count;
答案 0 :(得分:3)
你真的很亲密。我将使用这个子选项:
SELECT recorder_id, person_id, avg(day_count) day_avg
FROM
( SELECT recorder_id, person_id,
to_date(day) as record_day,
count(*) as day_count
FROM my_table
GROUP BY recorder_id, person_id, to_date(day) ) tmp_tbl
GROUP BY recorder_id, person_id
ORDER BY avg(day_count);
我道歉,我不在我可以测试它的地方,但它应该让你走上正确的道路。
祝你好运!答案 1 :(得分:1)
如果我理解正确,您只需要数据中的天数。这成为分母:
SELECT recorder_id, person_id,
count(*) / numdays
FROM t CROSS JOIN
(SELECT COUNT(DISTINCT to_date(day)) as numdays
FROM t
) tt
GROUP BY recorder_id, person_id, numdays
ORDER BY recorder_id, person_id;
在其他数据库中,您可以使用COUNT(DISTINCT)
作为窗口函数。我不认为Hive支持这一点。