SQL选择具有相同值的连续记录

时间:2018-08-22 10:14:18

标签: sql sql-server-2014 gaps-and-islands

我有一张带度量的表。每分钟进行一次测量。对于相同的device_id,我需要连续多次选择具有相同sample_value的行。

以下是初始数据:

    sample_date sample_time device_id   sample_value
    20180701    1010        111         11
    20180701    1011        111         12
    20180701    1012        111         13
    20180701    1013        222         11
    20180701    1014        222         11
    20180701    1015        222         12
    20180701    1016        111         12
    20180701    1017        111         11
    20180701    1018        222         13
    20180701    1019        222         12
    20180701    1020        222         13
    20180701    1021        222         12
    20180701    1022        222         12
    20180701    1023        111         12
    20180701    1024        111         13
    20180701    1025        111         13
    20180701    1026        111         12
    20180701    1027        111         13
    20180701    1028        222         14
    20180701    1029        222         13
    20180701    1030        222         14
    20180701    1031        222         14
    20180701    1032        222         14
    20180701    1033        222         14
    20180701    1034        222         14
    20180701    1035        222         14
    20180701    1036        111         13
    20180701    1037        111         13
    20180701    1038        111         14
    20180701    1039        111         13

这是我要寻找的结果:

sample_date sample_time device_id   sample_value
20180701    1013        222         11
20180701    1014        222         11
20180701    1021        222         12
20180701    1022        222         12
20180701    1024        111         13
20180701    1025        111         13
20180701    1030        222         14
20180701    1031        222         14
20180701    1032        222         14
20180701    1033        222         14
20180701    1034        222         14
20180701    1035        222         14
20180701    1036        111         13
20180701    1037        111         13

以下是测试数据:

IF OBJECT_ID('samples', 'U') IS NOT NULL 
DROP TABLE samples; 

create table samples (
sample_date int,
sample_time int,
device_id int,
sample_value int
)

insert samples
values
(20180701, 1010, 111, 11)
,(20180701, 1011, 111, 12)
,(20180701, 1012, 111, 13)
,(20180701, 1013, 222, 11)
,(20180701, 1014, 222, 11)
,(20180701, 1015, 222, 12)
,(20180701, 1016, 111, 12)
,(20180701, 1017, 111, 11)
,(20180701, 1018, 222, 13)
,(20180701, 1019, 222, 12)
,(20180701, 1020, 222, 13)
,(20180701, 1021, 222, 12)
,(20180701, 1022, 222, 12)
,(20180701, 1023, 111, 12)
,(20180701, 1024, 111, 13)
,(20180701, 1025, 111, 13)
,(20180701, 1026, 111, 12)
,(20180701, 1027, 111, 13)
,(20180701, 1028, 222, 14)
,(20180701, 1029, 222, 13)
,(20180701, 1030, 222, 14)
,(20180701, 1031, 222, 14)
,(20180701, 1032, 222, 14)
,(20180701, 1033, 222, 14)
,(20180701, 1034, 222, 14)
,(20180701, 1035, 222, 14)
,(20180701, 1036, 111, 13)
,(20180701, 1037, 111, 13)
,(20180701, 1038, 111, 14)
,(20180701, 1039, 111, 13)

select * from samples

这是我要使用的SQL,但我不知道如何设置正确的分区。

    select *
    from (select    sample_date,
                    sample_time,
                    device_id,
                    sample_value,
                    row_number() over (partition by sample_date,
                                                    device_id,
                                                    sample_value
                                            order by sample_date,
                                                    sample_time,
                                                    device_id) as occurrence
    from samples) t
    where     occurrence > 1

相似的主题:

Select statement to find duplicates on certain fields

How to find consecutive rows based on the value of a column?

3 个答案:

答案 0 :(得分:1)

如果您想在不使用LEADLAG的情况下执行此操作,则可以改为执行以下操作:

WITH Ordered AS (
    SELECT
        *,
        ROW_NUMBER() OVER (ORDER BY sample_date, sample_time) AS order_id
    FROM
        samples)
SELECT
    s1.sample_date,
    s1.sample_time,
    s1.device_id,
    s1.sample_value
FROM
    Ordered s1
    INNER JOIN Ordered s2 ON s2.device_id = s1.device_id AND s2.sample_value = s1.sample_value AND s2.order_id = s1.order_id + 1
UNION
SELECT
    s2.sample_date,
    s2.sample_time,
    s2.device_id,
    s2.sample_value
FROM
    Ordered s1
    INNER JOIN Ordered s2 ON s2.device_id = s1.device_id AND s2.sample_value = s1.sample_value AND s2.order_id = s1.order_id + 1
ORDER BY
    1, 2;

结果是:

sample_date sample_time device_id   sample_value
20180701    1013        222         11
20180701    1014        222         11
20180701    1021        222         12
20180701    1022        222         12
20180701    1024        111         13
20180701    1025        111         13
20180701    1030        222         14
20180701    1031        222         14
20180701    1032        222         14
20180701    1033        222         14
20180701    1034        222         14
20180701    1035        222         14
20180701    1036        111         13
20180701    1037        111         13

答案 1 :(得分:0)

我认为您想使用lag() / lead()

select s.*
from (select s.*,
             lag(device_id) over (order by sample_date, sample_time) as prev_di,
             lead(device_id) over (order by sample_date, sample_time) as next_di,
             lag(sample_value) over (order by sample_date, sample_time) as prev_sv,
             lead(sample_value) over (order by sample_date, sample_time) as next_sv
      from samples s
     ) s
where (prev_sv = sample_value and prev_di = device_id) or
      (next_sv = sample_value and prev_di = device_id);

Here是一个SQL提琴。

如果您特别希望相邻行成为下一个时间单位,则可以使用exists

select s.*
from samples s
where exists (select 1
              from samples s2 
              where s2.sample_date = s.sample_date and
                    s2.sample_time in (s.sample_time - 1, s.sample_time + 1
             );

答案 2 :(得分:0)

您可以尝试以下查询:

select date_time,
       device_id,
       sample_value
from ( 
    select date_time,
           device_id,
           sample_value,
           COUNT(*) over (partition by rnDiff) cnt
    from (
        select date_time,
               device_id,
               sample_value,
               ROW_NUMBER() over (order by date_time) -
               ROW_NUMBER() over (partition by device_id, sample_value order by date_time) rnDiff
        from (
            select DATETIMEFROMPARTS(sample_date/10000,(sample_date/100)%100,sample_date%100,sample_time/100,sample_time%100,0,0) date_time,
                   device_id,
                   sample_value
            from samples
        ) a 
    ) a
) a where cnt > 1
order by date_time

在最内部的查询中,我将您的日期和时间列转换为datetime格式,因此可以轻松地对其进行排序。然后,我使用row_number()函数来区分具有相同sample_value的组,最后在大多数外部查询中,我使用COUNT(*) over (partition by rnDiff)来计算不同的值。