我正在尝试为Query编写逻辑,这将允许我对用户活动进行分类:
•问题是一个表格,其中包含约5分钟的所有用户活动(并非所有用户都是5分钟,约3分钟,其他4分钟)并记录每个用户在特定状态下花费的时间
•用户通常会在当天的状态之间跳转。
问题:如果用户在不改变状态的情况下连续花费超过3小时(180分钟),则必须报告为:“未分类”
我正在使用的表的当前视图:
user_id record_date user_status
user1 9/3/2017 14:25 status_1
user1 9/3/2017 14:30 status_3
user1 9/3/2017 14:35 status_3
user1 9/3/2017 14:40 status_2
user1 9/3/2017 14:45 status_2
user1 9/3/2017 14:50 status_2
user1 9/3/2017 14:55 status_2
user1 9/3/2017 15:00 status_2
user1 9/3/2017 15:05 status_2
user1 9/3/2017 15:10 status_2
user1 9/3/2017 15:15 status_2
user1 9/3/2017 15:20 status_2
user1 9/3/2017 15:25 status_2
user1 9/3/2017 15:30 status_2
user1 9/3/2017 15:30 status_2
user1 9/3/2017 15:35 status_2
user1 9/3/2017 15:40 status_2
user1 9/3/2017 15:43 status_3
user1 9/3/2017 15:45 status_3
user1 9/3/2017 15:50 status_2
user1 9/3/2017 15:50 status_2
user1 9/3/2017 15:55 status_2
user1 9/3/2017 16:00 status_2
user1 9/3/2017 16:00 status_2
user1 9/3/2017 16:04 status_2
我开始测试以下逻辑,但是一旦我发现每个插槽不完全是5分钟,我就无法继续。
SELECT user_id ,record_date
,CASE
WHEN SUM(status_1) OVER (
PARTITION BY user_id ORDER BY record_date ASC ROWS BETWEEN 35 PRECEDING
AND current row
) >= 180
THEN 1
ELSE 0
END AS unclassified_flag
--2
,CASE
WHEN SUM(status_2) OVER (
PARTITION BY user_id ORDER BY record_date ASC ROWS BETWEEN 35 PRECEDING
AND current row
) >= 180
THEN 1
ELSE 0
END AS unclassified_flag
FROM table
任何替代逻辑的想法都值得赞赏
答案 0 :(得分:0)
签出窗口功能LAG()
https://docs.aws.amazon.com/redshift/latest/dg/r_WF_LAG.html
您可以为每行添加前一个时间戳,然后可以简单地获取间隔时间和按状态分组。唯一的缺点是它将包括用户不活动的时间,因此您可能要扔掉一些大东西。
例如
with status_intervals as (
SELECT
user_id, status, record_date
,lag(record_date) OVER (PARTITION BY user_id ORDER BY record_date) as last_date
FROM
table
)
SELECT
user_id, status
,sum(datediff(second, last_date, record_date)) as total_time_in_status
FROM
status_intervals
WHERE
datediff(second, last_date, record_date) < 900 --arbitrarily deciding 15min is likely inactive
GROUP BY
user_id, status