我对所需的SQL(Redshift)查询有一些想法。所以基本上,我有下表
userid | timestamp | fruit
1 | 2018-12-10T14:46:50 | banana
1 | 2018-12-10T15:46:50 | banana
1 | 2018-12-10T16:46:50 | apple
1 | 2018-12-10T17:46:50 | banana
是否有可能提供一个包含以下信息的新表
userid | start | end | fruit
1 | 2018-12-10T14:46:50 | 2018-12-10T16:46:50 | banana
1 | 2018-12-10T16:46:50 | 2018-12-10T17:46:50 | apple
1 | 2018-12-10T17:46:50 | | banana
显示用户保留自己喜欢的水果选择的时间范围。
谢谢!
D
答案 0 :(得分:3)
这是一个典型的gaps and islands
问题,lag
和lead
分析函数的用法如下:
with fruits(userid,timestamp,fruit) as (
values
(1,'2018-12-10T14:46:50','banana'),
(1,'2018-12-10T15:46:50','banana'),
(1,'2018-12-10T16:46:50','apple'),
(1,'2018-12-10T17:46:50','banana')
)
select userid, min(timestamp) as start, max(ld) as end, fruit
from
(
select f2.*,
sum(case when lg = fruit then 0 else 1 end) over
(partition by userid, fruit order by timestamp) sm
from
(
select f1.*,
lead(timestamp) over (partition by userid order by timestamp) as ld,
lag(fruit) over (partition by userid order by timestamp) as lg
from fruits f1
) f2
) f
group by userid, fruit, sm
order by start;
userid start end fruit
------- ------------------- ------------------- ------
1 2018-12-10T14:46:50 2018-12-10T16:46:50 banana
1 2018-12-10T16:46:50 2018-12-10T17:46:50 apple
1 2018-12-10T17:46:50 NULL banana
答案 1 :(得分:2)
模式(MySQL v8.0)
CREATE TABLE t1 (
`userid` INTEGER,
`timestamp` VARCHAR(19),
`fruit` VARCHAR(6)
);
INSERT INTO t1
(`userid`, `timestamp`, `fruit`)
VALUES
('1', '2018-12-10T14:46:50', 'banana'),
('1', '2018-12-10T15:46:50', 'banana'),
('1', '2018-12-10T16:46:50', 'apple'),
('1', '2018-12-10T17:46:50', 'banana');
查询#1
如果您不介意连续记录多个记录,则为简单方法
select userid, fruit, timestamp `start`,
lead(timestamp) over (order by timestamp) `end`
from t1;
| userid | fruit | start | end |
| ------ | ------ | ------------------- | ------------------- |
| 1 | banana | 2018-12-10T14:46:50 | 2018-12-10T15:46:50 |
| 1 | banana | 2018-12-10T15:46:50 | 2018-12-10T16:46:50 |
| 1 | apple | 2018-12-10T16:46:50 | 2018-12-10T17:46:50 |
| 1 | banana | 2018-12-10T17:46:50 | |
或 查询#2
SELECT t2.*
FROM (
SELECT userid,
fruit,
timestamp `tstart`,
CASE
WHEN fruit = Lead(fruit) over(ORDER BY timestamp) THEN lead(timestamp, 2) over ( ORDER BY timestamp)
ELSE lead(timestamp, 1) over ( ORDER BY timestamp)
end `tend`,
CASE
WHEN fruit = lag(fruit) over (ORDER BY timestamp) THEN 1
ELSE 0
end del
FROM t1 ) t2
WHERE del = 0;
| userid | fruit | tstart | tend | del |
| ------ | ------ | ------------------- | ------------------- | --- |
| 1 | banana | 2018-12-10T14:46:50 | 2018-12-10T16:46:50 | 0 |
| 1 | apple | 2018-12-10T16:46:50 | 2018-12-10T17:46:50 | 0 |
| 1 | banana | 2018-12-10T17:46:50 | | 0 |