如何使用Hive获得每小时平均数量的独特人数?

时间:2017-07-05 12:36:12

标签: sql hive

我在表my_table中有这些数据:

camera_id     person_id         datetime
1             1                 2017-03-02 18:06:20
1             1                 2017-03-02 18:05:10
1             1                 2017-04-01 18:04:09
2             1                 2017-03-02 19:06:50
2             2                 2017-03-02 19:07:22
2             2                 2017-03-02 19:09:15
2             3                 2017-05-03 19:07:05
2             4                 2017-05-03 19:19:08
2             5                 2017-05-03 19:20:18

我需要计算每个摄像头检测到的 UNIQUE 人的每小时平均数。

例如,让我们从19:00到20:00拍摄相机2和时间窗口。相机在2017-03-02上确定了2次唯一身份访问,在2017-05-03上确定了3次唯一身份访问。所以,答案是(2 + 3)/ 2 = 2.5

预期结果:

camera_id   HOUR   HOURLY_AVG_COUNT
1           18     1
2           19     2.5

1 个答案:

答案 0 :(得分:1)

SELECT COUNT(*)
FROM `githubarchive.day.*`
WHERE _TABLE_SUFFIX = (
  SELECT table_id
  FROM `githubarchive.day.__TABLES_SUMMARY__`
  ORDER BY creation_time DESC
  LIMIT 1
)
select      camera_id
           ,hour(datetime)                                                                                             as hour
           ,count(distinct person_id,date(datetime),hour(datetime)) / 
                count(distinct date(datetime),hour(datetime))    as hourly_avg_count 

from        my_table 

group by    camera_id
           ,hour(datetime) 

order by    camera_id
;

P.S。

+-----------+------+------------------+ | camera_id | hour | hourly_avg_count | +-----------+------+------------------+ | 1 | 18 | 1 | | 2 | 19 | 2.5 | +-----------+------+------------------+ 也可以替换为以下之一:

  • date(datetime),hour(datetime)
  • substr(cast(datetimeas string),1,13)