我可以使用SQL获得唯一的观看时间

时间:2018-10-13 17:33:55

标签: sql

这是我的示例输入数据。

KeyError: "Try running with errors='ignore' as key 'x' is not always present"

user_id首先从第0分钟到第5分钟观看视频v1,然后从第7分钟到第10分钟跳过视频。他再次回来观看第3分钟到第4分钟。因此,他的独特观看时间是:从第一行开始5分钟,从第二行开始3分钟,从第三行开始0分钟,因为第一行已经覆盖了

预期产量

bool visible = false;

toggleSettings() {
   if (!visible) {
      SystemChrome.setEnabledSystemUIOverlays([]);
   } else {
      SystemChrome.setEnabledSystemUIOverlays([SystemUiOverlay.top]);
   }

   visible = !visible;
}

(解释:第4行3个+第5行0,因为它覆盖了第4行+最后1行)

我可以使用python实现此功能,但不确定是否可以通过sql完成。

感谢您的帮助,如果格式化效果不佳,则对不起。

1 个答案:

答案 0 :(得分:1)

此解决方案假定您使用的RDBMS允许使用公共表表达式和窗口函数。这个想法是找到在同一组中的观察期。本质上,该想法是定义组,以使watch_start行的时间落在同一“组”内。

WITH cte AS
(
    SELECT *, COALESCE(CASE WHEN watch_start BETWEEN lag_ws AND lag_we THEN 0 ELSE 1 END, 1) AS gp
    FROM
    (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY person_id, video_id ORDER BY watch_start) AS seq,
              LAG(watch_start, 1) OVER(PARTITION BY person_id, video_id ORDER BY watch_start) AS lag_ws,
              LAG(watch_end, 1) OVER(PARTITION BY person_id, video_id ORDER BY watch_start) AS lag_we
    FROM vids
    ) a1                                        
)

SELECT person_id, video_id, SUM(max_we - min_ws) AS watch_time
FROM
(
    SELECT person_id, video_id, MIN(watch_start) AS min_ws, MAX(watch_end) AS max_we
    FROM
    (
    SELECT person_id, video_id, watch_start, watch_end,
    SUM(gp) OVER(PARTITION BY person_id, video_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grps
    FROM cte
    ) a2
    GROUP BY person_id, video_id, a2.grps
) a3
GROUP BY person_id, video_id

输出:

person_id    video_id   watch_time
    1            v1              8    
    2            v2              4