尺寸缓慢变化的最小/最大时间戳

时间:2019-01-01 17:48:59

标签: sql amazon-redshift gaps-and-islands

我对所需的SQL(Redshift)查询有一些想法。所以基本上,我有下表

userid | timestamp           | fruit
  1    | 2018-12-10T14:46:50 | banana 
  1    | 2018-12-10T15:46:50 | banana
  1    | 2018-12-10T16:46:50 | apple
  1    | 2018-12-10T17:46:50 | banana

是否有可能提供一个包含以下信息的新表

userid | start               | end                 | fruit
  1    | 2018-12-10T14:46:50 | 2018-12-10T16:46:50 | banana 
  1    | 2018-12-10T16:46:50 | 2018-12-10T17:46:50 | apple
  1    | 2018-12-10T17:46:50 |                     | banana

显示用户保留自己喜欢的水果选择的时间范围。

谢谢!

D

2 个答案:

答案 0 :(得分:3)

这是一个典型的gaps and islands问题,laglead分析函数的用法如下:

with fruits(userid,timestamp,fruit) as (
values 
  (1,'2018-12-10T14:46:50','banana'), 
  (1,'2018-12-10T15:46:50','banana'),
  (1,'2018-12-10T16:46:50','apple'),
  (1,'2018-12-10T17:46:50','banana')    
)
 select userid, min(timestamp) as start, max(ld) as end, fruit
   from
   (
        select f2.*,
               sum(case when lg = fruit then 0 else 1 end) over
                         (partition by userid, fruit order by timestamp) sm
          from
          (
             select f1.*,
                    lead(timestamp) over (partition by userid order by timestamp) as ld,
                    lag(fruit) over (partition by userid order by timestamp) as lg
               from fruits f1
          ) f2
   ) f    
  group by userid, fruit, sm
  order by start;

userid     start                    end           fruit
------- ------------------- -------------------   ------
  1     2018-12-10T14:46:50 2018-12-10T16:46:50   banana
  1     2018-12-10T16:46:50 2018-12-10T17:46:50   apple
  1     2018-12-10T17:46:50         NULL          banana

Rextester Demo

答案 1 :(得分:2)

模式(MySQL v8.0)

CREATE TABLE t1 (
  `userid` INTEGER,
  `timestamp` VARCHAR(19),
  `fruit` VARCHAR(6)
);

INSERT INTO t1
  (`userid`, `timestamp`, `fruit`)
VALUES
  ('1', '2018-12-10T14:46:50', 'banana'),
  ('1', '2018-12-10T15:46:50', 'banana'),
  ('1', '2018-12-10T16:46:50', 'apple'),
  ('1', '2018-12-10T17:46:50', 'banana');

查询#1

如果您不介意连续记录多个记录,则为简单方法

select userid, fruit, timestamp `start`, 
  lead(timestamp) over (order by timestamp) `end`
from t1;

| userid | fruit  | start               | end                 |
| ------ | ------ | ------------------- | ------------------- |
| 1      | banana | 2018-12-10T14:46:50 | 2018-12-10T15:46:50 |
| 1      | banana | 2018-12-10T15:46:50 | 2018-12-10T16:46:50 |
| 1      | apple  | 2018-12-10T16:46:50 | 2018-12-10T17:46:50 |
| 1      | banana | 2018-12-10T17:46:50 |                     |

查询#2

SELECT t2.* 
FROM   ( 
                SELECT   userid, 
                         fruit, 
                         timestamp `tstart`, 
                         CASE 
                                  WHEN fruit = Lead(fruit) over(ORDER BY timestamp) THEN lead(timestamp, 2) over ( ORDER BY timestamp)
                                  ELSE lead(timestamp, 1) over ( ORDER BY timestamp) 
                         end `tend`, 
                         CASE 
                                  WHEN fruit = lag(fruit) over (ORDER BY timestamp) THEN 1 
                                  ELSE 0 
                         end del 
                FROM     t1 ) t2 
WHERE  del = 0;


| userid | fruit  | tstart              | tend                | del |
| ------ | ------ | ------------------- | ------------------- | --- |
| 1      | banana | 2018-12-10T14:46:50 | 2018-12-10T16:46:50 | 0   |
| 1      | apple  | 2018-12-10T16:46:50 | 2018-12-10T17:46:50 | 0   |
| 1      | banana | 2018-12-10T17:46:50 |                     | 0   |

View on DB Fiddle