SQL:上一次非平等事件

时间:2017-06-28 11:13:31

标签: sql amazon-redshift

我有一张下表中的表格:user idproduct已激活,event已发生,time对应于发生的事件。

UID | |Product | Event | Time
A      C1         F      2017-06-23
A      C2         S      2017-06-21
A      C1         S      2017-06-19
A      C1         S      2017-06-17
B      C3         F      2017-06-12
B      C3         S      2017-06-12
C      C2         F      2017-06-02
C      C2         F      2017-06-01

我想找到每个用户和产品之前的S事件与F当前事件的时差。

UID | |Product | Event | Time        | Days_Diff
A      C1         F      2017-06-23    4
A      C2         S      2017-06-21    NULL
A      C1         S      2017-06-19    NULL
A      C1         S      2017-06-17    NULL
B      C3         F      2017-06-12    0
B      C3         S      2017-06-12    NULL
C      C2         F      2017-06-02    NULL
C      C2         F      2017-06-01    NULL

我尝试了类似下面的内容,但它无法帮助我跟踪上一个产品和事件

SELECT UID, Product, Event, Time,
       CASE 
       -- product is equal to last product
       WHEN Product = LAG(Product, 1) OVER (PARTITION BY UID, Product ORDER BY Time) 
       -- current event = F and last event = S
       AND Event = 'F' AND LAG(Event, 1) OVER (PARTITION BY UID, Product ORDER BY Time) = 'S' 
       -- subtract current time by the last time this product was activated
       THEN DATEDIFF('DAY', MAX(Time) OVER (PARTITION BY UID, Product ORDER BY Time 
                                   ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), Time) 
       END AS days_diff
FROM table

但是,这并不能帮助我比较用户激活产品并且S事件没有直接跟随F事件的情况。例如。如下案例

UID | |Product | Event | Time        | Days_Diff
A      C1         F      2017-06-23    4
A      C2         S      2017-06-21    NULL
A      C1         S      2017-06-19    NULL
A      C1         S      2017-06-17    NULL

我该如何解决这个问题?

1 个答案:

答案 0 :(得分:1)

你似乎想要从最早的“S”到“F”的时间。如果是这样的话:

SELECT UID, Product, Event, Time,
       (CASE WHEN Event = 'F'
             THEN DATEDIFF(DAY,
                           MIN(CASE WHEN Event = 'S' THEN Time END) 
                               OVER (PARTITION BY UID
                                     ORDER BY TIME
                                     ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
                                    ),
                           Time
                          )
        END) AS days_diff
FROM table;

注意:这使用Redshift的DATEDIFF()功能。第一个参数(据我所知)是一个datepart,而不是一个字符串。

编辑:

我知道,你想在同一产品上使用“S”。这有点不同:

SELECT UID, Product, Event, Time,
       (CASE WHEN Event = 'F'
             THEN DATEDIFF(DAY,
                           MAX(CASE WHEN Event = 'S' THEN Time END)                                    
                               OVER (PARTITION BY UID
                                     ORDER BY TIME
                                     ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
                                    ),
                           Time
                          )
        END) AS days_diff
FROM table;