当我有一个包含电影开始和结束时间的表格时,如何计算每小时正在观看的电影数量?

时间:2017-06-20 16:17:23

标签: sql apache-spark-sql hiveql

我有一张表格如下:

start_timestamp        end_timestamp
2012-11-18 05:53:36.0  2012-11-18 7:46:40.0
2012-11-18 06:34:23.0  2012-12-18 09:21:57.0

我希望输出看起来像:

hour                   moves_being_played
2012-11-18 05:00:00.0  1
2012-11-18 06:00:00.0  2
2012-11-18 07:00:00.0  2
2012-11-18 08:00:00.0  1
2012-11-18 09:00:00.0  1

到目前为止,我已尝试手动设置每小时的值,并计算开始时间较短且结束时间较长的电影数量。

SELECT
COUNT(CASE WHEN HOUR(start_time) < 6 THEN 1 ELSE null END)
COUNT(CASE WHEN HOUR(start_time) < 7 THEN 1 ELSE null END) - COUNT(CASE WHEN HOUR(end_time) < 7 THEN 1 ELSE null END)
...
COUNT(CASE WHEN HOUR(start_time) < 9 THEN 1 ELSE null END) - COUNT(CASE WHEN HOUR(end_time) < 9 THEN 1 ELSE null END)
FROM table 

如何在不手动设置每个小时的情况下完成此操作,结果为&#34; long&#34;而不是&#34;宽&#34;表

1 个答案:

答案 0 :(得分:0)

我的方法是:

我以递归方式生成一天中的所有小时,并将结果与​​电影表

结合起来
with mycte as
(
    SELECT CAST('00:00:00' as time)  AS MyHour
    UNION ALL
    SELECT DATEADD(HOUR, 1, MyHour)
    FROM mycte 
    WHERE  MyHour < '23:00:00'
)
SELECT COUNT(*) AS movies_being_played
      ,CONVERT(DATETIME, CONVERT(CHAR(8), yourmovietable.start_timestamp, 112) + ' ' + CONVERT(CHAR(8), mycte.MyHour, 108))
  FROM mycte
  INNER JOIN yourmovietable
  ON mycte.MyHour BETWEEN CAST(yourmovietable.start_timestamp as time) 
                      AND CAST(yourmovietable.end_timestamp as time)                   
 GROUP BY CONVERT(DATETIME, CONVERT(CHAR(8), yourmovietable.start_timestamp, 112) + ' ' + CONVERT(CHAR(8), mycte.MyHour, 108))