这是我的示例输入数据。
KeyError: "Try running with errors='ignore' as key 'x' is not always present"
user_id首先从第0分钟到第5分钟观看视频v1,然后从第7分钟到第10分钟跳过视频。他再次回来观看第3分钟到第4分钟。因此,他的独特观看时间是:从第一行开始5分钟,从第二行开始3分钟,从第三行开始0分钟,因为第一行已经覆盖了
预期产量
bool visible = false;
toggleSettings() {
if (!visible) {
SystemChrome.setEnabledSystemUIOverlays([]);
} else {
SystemChrome.setEnabledSystemUIOverlays([SystemUiOverlay.top]);
}
visible = !visible;
}
(解释:第4行3个+第5行0,因为它覆盖了第4行+最后1行)
我可以使用python实现此功能,但不确定是否可以通过sql完成。
感谢您的帮助,如果格式化效果不佳,则对不起。
答案 0 :(得分:1)
此解决方案假定您使用的RDBMS允许使用公共表表达式和窗口函数。这个想法是找到在同一组中的观察期。本质上,该想法是定义组,以使watch_start
行的时间落在同一“组”内。
WITH cte AS
(
SELECT *, COALESCE(CASE WHEN watch_start BETWEEN lag_ws AND lag_we THEN 0 ELSE 1 END, 1) AS gp
FROM
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY person_id, video_id ORDER BY watch_start) AS seq,
LAG(watch_start, 1) OVER(PARTITION BY person_id, video_id ORDER BY watch_start) AS lag_ws,
LAG(watch_end, 1) OVER(PARTITION BY person_id, video_id ORDER BY watch_start) AS lag_we
FROM vids
) a1
)
SELECT person_id, video_id, SUM(max_we - min_ws) AS watch_time
FROM
(
SELECT person_id, video_id, MIN(watch_start) AS min_ws, MAX(watch_end) AS max_we
FROM
(
SELECT person_id, video_id, watch_start, watch_end,
SUM(gp) OVER(PARTITION BY person_id, video_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grps
FROM cte
) a2
GROUP BY person_id, video_id, a2.grps
) a3
GROUP BY person_id, video_id
输出:
person_id video_id watch_time
1 v1 8
2 v2 4