如何使用SQL查找数据峰值?

时间:2019-04-10 15:56:02

标签: sql postgresql

说我有以下模式:

SENSOR
--------------
ID (numeric)
READ_DATE (date)
VALUE (numeric)

我想找到至少持续X天的数据峰值。我们每天仅从传感器读取一次读数,因此ID和READ_DATE在唯一性方面几乎可以互换。

例如,我有以下记录:

1, 2019-01-01, 100
2, 2019-01-02, 1000
3, 2019-01-03, 1500
4, 2019-01-04, 1100
5, 2019-01-05, 500
6, 2019-01-06, 700
7, 2019-01-07, 1500
8, 2019-01-08, 2000

在此示例中,对于X = 2且VALUE> = 1000,我想获取第3、4、8行,因为(2、3),(3、4),(7、8)连续> =到1000。

我不确定该如何处理。我当时想做一个COUNT窗口函数,但不知道如何检查是否有X条记录> = 1000。

4 个答案:

答案 0 :(得分:0)

这与我认为可以得到的一样普遍。

首先,我使用表变量创建一些数据,但这可能是临时/物理表:

DECLARE @table TABLE (id INT, [date] DATE, [value] INT);
INSERT INTO @table SELECT 1, '20190101', 100;
INSERT INTO @table SELECT 2, '20190102', 1000;
INSERT INTO @table SELECT 3, '20190103', 1500;
INSERT INTO @table SELECT 4, '20190104', 1100;
INSERT INTO @table SELECT 5, '20190105', 500;
INSERT INTO @table SELECT 6, '20190106', 700;
INSERT INTO @table SELECT 7, '20190107', 1500;
INSERT INTO @table SELECT 8, '20190108', 2000;

然后,我使用CTE(可以将其换成效率较低的子查询):

WITH x AS (
    SELECT 
        *,
        CASE WHEN [value] >= 1000 THEN 1 END AS spike
    FROM 
        @table)
SELECT
    x2.id,
    x2.[date],
    x2.[value]
FROM
    x x1
    INNER JOIN x x2 ON x2.id = x1.id + 1
WHERE
    x1.spike = 1
    AND x2.spike = 1;

这假定您的ID是连续的,如果不是,则需要按日期加入,这比较麻烦。

结果:

id  date        value
3   2019-01-03  1500
4   2019-01-04  1100
8   2019-01-08  2000

好吧,这不是Postgres,也不是很通用(递归CTE),但它似乎可以工作??

DECLARE @spike_length INT = 3;

WITH x AS (
    SELECT 
        *,
        CASE WHEN [value] >= 1000 THEN 1 ELSE 0 END AS spike
    FROM 
        @table),
y AS (
    SELECT
        x.id,
        x.[date],
        x.[value],
        x.spike AS spike_length
    FROM
        x
    WHERE
        id = 1
    UNION ALL
    SELECT
        x.id,
        x.[date],
        x.[value],
        CASE WHEN x.spike = 0 THEN 0 ELSE y.spike_length + 1 END
    FROM
        y
        INNER JOIN x ON x.id = y.id + 1)
SELECT * FROM y WHERE spike_length >= @spike_length;

结果:

id  date        value   spike_length
4   2019-01-04  1100    3

答案 1 :(得分:0)

如果您能够使用解析函数,那么您应该可以执行以下操作来获得所需的信息(我将1000的限制更改为1500,否则它将带回所有相继加起来为1000的行,以上)

    CREATE TABLE test1 (
    id number,
    value number
 );

 insert all
    into test1 (id, value) values (1, 100)
    into test1 (id, value) values (2, 1000)
    into test1 (id, value) values (3, 1500)
    into test1 (id, value) values (4, 1100)
    into test1 (id, value) values (5, 500)
    into test1 (id, value) values (6, 700)
    into test1 (id, value) values (7, 1500)
    into test1 (id, value) values (8, 2000)
select * from dual;

编辑-重新阅读后-从评论中重新进行回答实际的问题!使用2个滞后-一个滞后以确保前一天是1000天或更长时间,另一个滞后是计算X过滤发生了多少次。

SELECT * FROM 
(
    SELECT id,
        value, 
        spike, 
        CASE WHEN spike = 0 THEN 0 ELSE (spike + LAG(spike, 1, 0) OVER (ORDER BY id) + 1) END as SPIKE_LENGTH
    FROM (
        select id,
            value, 
            CASE WHEN LAG(value, 1, 0) OVER (ORDER BY id) >= 1000 AND value >= 1000 THEN 1 ELSE 0 END AS SPIKE
        from test1
        )
)
WHERE spike_length >= 2;

返回哪个

ID  Value  spike    spike_length
3   1500    1   2
4   1100    1   3
8   2000    1   2

如果将尖峰长度过滤器增加到> = 3-仅获得ID 4,这是连续3个超过1000的唯一ID。

答案 2 :(得分:0)

您可以将其视为缺口与孤岛的问题-查找超过阈值的连续值。以下获取此类序列的第一个日期:

select s.read_date
from (select s.*,
             row_number() over (order by date) as seqnum
      from sensor s
      where value >= 1000
     ) s
group by (date - seqnum * interval '1 day')
having count(*) >= 2;

这里的观察结果是(date - seqnum * interval '1 day')对于相邻的行是恒定的。

您可以在原始行中再加上一层子查询:

select s.*
from (select s.*, count(*) over (partition by (date - seqnum * interval '1 day') as cnt
      from (select s.*,
                   row_number() over (order by date) as seqnum
            from sensor s
            where value >= 1000
           ) s
     ) s
where cnt >= 2;

答案 3 :(得分:0)

我得出以下结论:

-- this parts helps filtering values < 1000 later on
with a as (
    select *,
    case when value >= 1000 then 1 else 0 end as indicator
    from sensor),
-- using the indicator, create a window that calculates the length of the spike
b as (
    select *,
    sum(indicator) over (order by id asc rows between 2 preceding and current row) as spike
    from a)
-- now filter out all spikes < 3
-- (because the window has a size of 3, it can never be larger than 3, so = 3 is okay)
select id, value from b where spike = 3;

这是@Gordon Linoff的回答的扩展,但是我发现它太复杂了。