我有一张下表中的表格:user id
,product
已激活,event
已发生,time
对应于发生的事件。
UID | |Product | Event | Time
A C1 F 2017-06-23
A C2 S 2017-06-21
A C1 S 2017-06-19
A C1 S 2017-06-17
B C3 F 2017-06-12
B C3 S 2017-06-12
C C2 F 2017-06-02
C C2 F 2017-06-01
我想找到每个用户和产品之前的S
事件与F
当前事件的时差。
UID | |Product | Event | Time | Days_Diff
A C1 F 2017-06-23 4
A C2 S 2017-06-21 NULL
A C1 S 2017-06-19 NULL
A C1 S 2017-06-17 NULL
B C3 F 2017-06-12 0
B C3 S 2017-06-12 NULL
C C2 F 2017-06-02 NULL
C C2 F 2017-06-01 NULL
我尝试了类似下面的内容,但它无法帮助我跟踪上一个产品和事件
SELECT UID, Product, Event, Time,
CASE
-- product is equal to last product
WHEN Product = LAG(Product, 1) OVER (PARTITION BY UID, Product ORDER BY Time)
-- current event = F and last event = S
AND Event = 'F' AND LAG(Event, 1) OVER (PARTITION BY UID, Product ORDER BY Time) = 'S'
-- subtract current time by the last time this product was activated
THEN DATEDIFF('DAY', MAX(Time) OVER (PARTITION BY UID, Product ORDER BY Time
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), Time)
END AS days_diff
FROM table
但是,这并不能帮助我比较用户激活产品并且S
事件没有直接跟随F
事件的情况。例如。如下案例
UID | |Product | Event | Time | Days_Diff
A C1 F 2017-06-23 4
A C2 S 2017-06-21 NULL
A C1 S 2017-06-19 NULL
A C1 S 2017-06-17 NULL
我该如何解决这个问题?
答案 0 :(得分:1)
你似乎想要从最早的“S”到“F”的时间。如果是这样的话:
SELECT UID, Product, Event, Time,
(CASE WHEN Event = 'F'
THEN DATEDIFF(DAY,
MIN(CASE WHEN Event = 'S' THEN Time END)
OVER (PARTITION BY UID
ORDER BY TIME
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
),
Time
)
END) AS days_diff
FROM table;
注意:这使用Redshift的DATEDIFF()
功能。第一个参数(据我所知)是一个datepart,而不是一个字符串。
编辑:
我知道,你想在同一产品上使用“S”。这有点不同:
SELECT UID, Product, Event, Time,
(CASE WHEN Event = 'F'
THEN DATEDIFF(DAY,
MAX(CASE WHEN Event = 'S' THEN Time END)
OVER (PARTITION BY UID
ORDER BY TIME
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
),
Time
)
END) AS days_diff
FROM table;