所以我在雅典娜有一个数据集,因此,出于这个目的,您可以将其视为postgres数据库。可以在此sql fiddle中看到数据样本。
这里是一个示例:
&&
我想要得到的是一个包含所有值但突出显示“ p”的最大值和连续的“ v”的最小值的数据集。
所以最终我会得到:
create table vals (
timestamp int,
type varchar(25),
val int
);
insert into vals(timestamp,type, val)
values (10, null, 1),
(20, null, 2),
(39, null, 1),
(40,'p',1),
(50,'p',2),
(60,'p',1),
(70,'v',5),
(80,'v',6),
(90,'v',6),
(100,'v',3),
(110,null,3),
(120,'v',6),
(130,null,3),
(140,'p',10),
(150,'p',8),
(160,null,3),
(170,'p',1),
(180,'p',2),
(190,'p',2),
(200,'p',1),
(210,null,3),
(220,'v',1),
(230,'v',1),
(240,'v',3),
(250,'v',41)
is peak对于类型有很多选择,如果它是某种密集的秩或递增的数字就可以了。如此一来,我就可以确信,在连续的范围内,“标记”的是最高值或最低值。
祝你好运,谢谢协助
注意:峰的最大值或峰谷的最小值可以在连续集中的某个位置,但是一旦类型改变,我们就会重新开始。
答案 0 :(得分:3)
您可以使用LEAD/LAG window functions:
var dst ...
输出:
SELECT *,
CASE WHEN type = 'p' AND val>LAG(val) OVER(PARTITION BY type ORDER BY timestamp)
AND val > LEAD(val) OVER(PARTITION BY type ORDER BY timestamp) THEN 1
WHEN type = 'v' AND val<LAG(val) OVER(PARTITION BY type ORDER BY timestamp)
AND val < LEAD(val) OVER(PARTITION BY type ORDER BY timestamp) THEN 1
END AS is_peak
FROM vals
ORDER BY timestamp;
带有window子句的版本:
┌───────────┬───────┬──────┬─────────┐
│ timestamp │ type │ val │ is_peak │
├───────────┼───────┼──────┼─────────┤
│ 10 │ │ 1 │ │
│ 20 │ │ 2 │ │
│ 39 │ │ 1 │ │
│ 40 │ p │ 1 │ │
│ 50 │ p │ 2 │ 1 │
│ 60 │ p │ 1 │ │
│ 70 │ v │ 5 │ │
│ 80 │ v │ 6 │ │
│ 90 │ v │ 6 │ │
│ 100 │ v │ 3 │ 1 │
│ 110 │ │ 3 │ │
│ 120 │ v │ 6 │ │
│ 130 │ │ 3 │ │
│ 140 │ p │ 10 │ 1 │
│ 150 │ p │ 8 │ │
└───────────┴───────┴──────┴─────────┘
编辑
我认为,只需进行很小的更改,我们就可以得到时间戳记120,就这样
SELECT *, CASE WHEN type = 'p' AND val > LAG(val) OVER s
AND val > LEAD(val) OVER s THEN 1
WHEN type = 'v' AND val < LAG(val) OVER s
AND val < LEAD(val) OVER s THEN 1
END AS is_peak
FROM vals
WINDOW s AS (PARTITION BY type ORDER BY timestamp)
ORDER BY timestamp;
编辑2:
具有SELECT *,CASE
WHEN type IN ('p','v') AND val > LAG(val,1,0) OVER(PARTITION BY type ORDER BY timestamp)
AND val > LEAD(val,1,0) OVER(PARTITION BY type ORDER BY timestamp) THEN 1
WHEN type IN ('v') AND val < LAG(val,1,0) OVER(PARTITION BY type ORDER BY timestamp)
AND val < LEAD(val,1,0) OVER(PARTITION BY type ORDER BY timestamp) THEN 1
END AS is_peak
FROM vals
ORDER BY timestamp;
检测(处理平台)的最终解:
gaps-and-islands
输出:
WITH cte AS (
SELECT *, LEAD(val,1,0) OVER(PARTITION BY type ORDER BY timestamp) AS l
FROM vals
), cte2 AS (
SELECT *, SUM(CASE WHEN val = l THEN 1 ELSE 0 END) OVER(PARTITION BY type ORDER BY timestamp) AS dr
FROM cte
), cte3 AS (
SELECT *, CASE WHEN type IN ('p') AND val > LAG(val,1) OVER(PARTITION BY type ORDER BY timestamp)
AND val >= LEAD(val,1) OVER(PARTITION BY type ORDER BY timestamp) THEN 1
WHEN type IN ('v') AND val < LAG(val,1) OVER(PARTITION BY type ORDER BY timestamp)
AND val <= LEAD(val,1) OVER(PARTITION BY type ORDER BY timestamp) THEN 1
END AS is_peak
FROM cte2
)
SELECT timestamp, type, val,
CASE WHEN is_peak = 1 THEN 1
WHEN EXISTS (SELECT 1 FROM cte3 cx
WHERE cx.is_peak = 1
AND cx.val = cte3.val
AND cx.type = cte3.type
AND cx.dr = cte3.dr)
THEN 1
END is_peak
FROM cte3
ORDER BY timestamp;
附加说明:
ISO SQL:2016为这种情况添加了模式匹配MATCH_RECOGNIZE,在这种情况下,您为┌────────────┬───────┬──────┬─────────┐
│ timestamp │ type │ val │ is_peak │
├────────────┼───────┼──────┼─────────┤
│ 10 │ │ 1 │ │
│ 20 │ │ 2 │ │
│ 39 │ │ 1 │ │
│ 40 │ p │ 1 │ │
│ 50 │ p │ 2 │ 1 │
│ 60 │ p │ 1 │ │
│ 70 │ v │ 5 │ │
│ 80 │ v │ 6 │ │
│ 90 │ v │ 6 │ │
│ 100 │ v │ 3 │ 1 │
│ 110 │ │ 3 │ │
│ 120 │ v │ 6 │ │
│ 130 │ │ 3 │ │
│ 140 │ p │ 10 │ 1 │
│ 150 │ p │ 8 │ │
│ 160 │ │ 3 │ │
│ 170 │ p │ 1 │ │
│ 180 │ p │ 2 │ 1 │
│ 190 │ p │ 2 │ 1 │
│ 200 │ p │ 1 │ │
│ 210 │ │ 3 │ │
│ 220 │ v │ 1 │ 1 │
│ 230 │ v │ 1 │ 1 │
│ 240 │ v │ 3 │ │
│ 250 │ v │ 41 │ │
└────────────┴───────┴──────┴─────────┘
之类的峰值定义了正则表达式,但目前仅Oracle支持。
相关文章:Modern SQL - match_recognize Regular Expressions Over Rows
答案 1 :(得分:3)
有一个小技巧可以解决像这样的“离岛”问题。
通过从行号中减去行号超过某个值,您可以得到一些排名。
出于某些目的,此方法有一些缺点。
但这适用于这种情况。
一旦计算出排名,外部查询中的其他窗口函数便可以使用该排名。
我们可以再次使用row_number。
但是根据要求,您可以改用DENSE_RANK或MIN&MAX的窗口函数。
然后,我们仅将它们包装在CASE
中,以根据类型来选择不同的逻辑。
select timestamp, type, val,
(case
when type = 'v' and row_number() over (partition by (rn1-rn2), type order by val, rn1) = 1 then 1
when type = 'p' and row_number() over (partition by (rn1-rn2), type order by val desc, rn1) = 1 then 1
end) is_peak
-- , rn1, rn2, (rn1-rn2) as rnk
from
(
select timestamp, type, val,
row_number() over (order by timestamp) as rn1,
row_number() over (partition by type order by timestamp) as rn2
from vals
) q
order by timestamp;
您可以测试SQL提琴here
返回:
timestamp type val is_peak
--------- ---- ---- -------
10 null 1 null
20 null 2 null
39 null 1 null
40 p 1 null
50 p 2 1
60 p 1 null
70 v 5 null
80 v 6 null
90 v 6 null
100 v 3 1
110 null 3 null
120 v 6 1
130 null 3 null
140 p 10 1
150 p 8 null
160 null 3 null
170 p 1 null
180 p 2 1
190 p 2 null
200 p 1 null
210 null 3 null
220 v 1 1
230 v 1 null
240 v 3 null
250 v 41 null
答案 2 :(得分:1)
您可以在case
语句中使用子查询来实现此目的:
create table #vals
(
[timestamp] int,
[type] varchar(25),
val int
);
insert into #vals ([timestamp], [type], val)
values (10, null, 1),
(20, null, 2),
(30, null, 1),
(40,'p',1),
(50,'p',2),
(60,'p',1),
(70,'v',5),
(80,'v',6),
(90,'v',6),
(100,'v',3),
(110,null,3)
select
r.*,
case
when r.[type] = 'p' and not exists (select * from #vals c where c.[type] = r.[type] and c.val > r.val) then 1
when r.[type] = 'v' and not exists (select * from #vals c where c.[type] = r.[type] and c.val < r.val) then 1
else null
end as is_peak
from #vals r
drop table #vals
结果:
/----------------------------------\
| timestamp | type | val | is_peak |
|-----------|------|-----|---------|
| 10 | NULL | 1 | NULL |
| 20 | NULL | 2 | NULL |
| 30 | NULL | 1 | NULL |
| 40 | p | 1 | NULL |
| 50 | p | 2 | 1 |
| 60 | p | 1 | NULL |
| 70 | v | 5 | NULL |
| 80 | v | 6 | NULL |
| 90 | v | 6 | NULL |
| 100 | v | 3 | 1 |
| 110 | NULL | 3 | NULL |
\----------------------------------/
注意:如果有多条记录具有相同(峰值)val
,则它们将在1
列中分别用is_peak
标记。