我有一张带度量的表。每分钟进行一次测量。对于相同的device_id,我需要连续多次选择具有相同sample_value的行。
以下是初始数据:
sample_date sample_time device_id sample_value
20180701 1010 111 11
20180701 1011 111 12
20180701 1012 111 13
20180701 1013 222 11
20180701 1014 222 11
20180701 1015 222 12
20180701 1016 111 12
20180701 1017 111 11
20180701 1018 222 13
20180701 1019 222 12
20180701 1020 222 13
20180701 1021 222 12
20180701 1022 222 12
20180701 1023 111 12
20180701 1024 111 13
20180701 1025 111 13
20180701 1026 111 12
20180701 1027 111 13
20180701 1028 222 14
20180701 1029 222 13
20180701 1030 222 14
20180701 1031 222 14
20180701 1032 222 14
20180701 1033 222 14
20180701 1034 222 14
20180701 1035 222 14
20180701 1036 111 13
20180701 1037 111 13
20180701 1038 111 14
20180701 1039 111 13
这是我要寻找的结果:
sample_date sample_time device_id sample_value
20180701 1013 222 11
20180701 1014 222 11
20180701 1021 222 12
20180701 1022 222 12
20180701 1024 111 13
20180701 1025 111 13
20180701 1030 222 14
20180701 1031 222 14
20180701 1032 222 14
20180701 1033 222 14
20180701 1034 222 14
20180701 1035 222 14
20180701 1036 111 13
20180701 1037 111 13
以下是测试数据:
IF OBJECT_ID('samples', 'U') IS NOT NULL
DROP TABLE samples;
create table samples (
sample_date int,
sample_time int,
device_id int,
sample_value int
)
insert samples
values
(20180701, 1010, 111, 11)
,(20180701, 1011, 111, 12)
,(20180701, 1012, 111, 13)
,(20180701, 1013, 222, 11)
,(20180701, 1014, 222, 11)
,(20180701, 1015, 222, 12)
,(20180701, 1016, 111, 12)
,(20180701, 1017, 111, 11)
,(20180701, 1018, 222, 13)
,(20180701, 1019, 222, 12)
,(20180701, 1020, 222, 13)
,(20180701, 1021, 222, 12)
,(20180701, 1022, 222, 12)
,(20180701, 1023, 111, 12)
,(20180701, 1024, 111, 13)
,(20180701, 1025, 111, 13)
,(20180701, 1026, 111, 12)
,(20180701, 1027, 111, 13)
,(20180701, 1028, 222, 14)
,(20180701, 1029, 222, 13)
,(20180701, 1030, 222, 14)
,(20180701, 1031, 222, 14)
,(20180701, 1032, 222, 14)
,(20180701, 1033, 222, 14)
,(20180701, 1034, 222, 14)
,(20180701, 1035, 222, 14)
,(20180701, 1036, 111, 13)
,(20180701, 1037, 111, 13)
,(20180701, 1038, 111, 14)
,(20180701, 1039, 111, 13)
select * from samples
这是我要使用的SQL,但我不知道如何设置正确的分区。
select *
from (select sample_date,
sample_time,
device_id,
sample_value,
row_number() over (partition by sample_date,
device_id,
sample_value
order by sample_date,
sample_time,
device_id) as occurrence
from samples) t
where occurrence > 1
相似的主题:
Select statement to find duplicates on certain fields
How to find consecutive rows based on the value of a column?
答案 0 :(得分:1)
如果您想在不使用LEAD
或LAG
的情况下执行此操作,则可以改为执行以下操作:
WITH Ordered AS (
SELECT
*,
ROW_NUMBER() OVER (ORDER BY sample_date, sample_time) AS order_id
FROM
samples)
SELECT
s1.sample_date,
s1.sample_time,
s1.device_id,
s1.sample_value
FROM
Ordered s1
INNER JOIN Ordered s2 ON s2.device_id = s1.device_id AND s2.sample_value = s1.sample_value AND s2.order_id = s1.order_id + 1
UNION
SELECT
s2.sample_date,
s2.sample_time,
s2.device_id,
s2.sample_value
FROM
Ordered s1
INNER JOIN Ordered s2 ON s2.device_id = s1.device_id AND s2.sample_value = s1.sample_value AND s2.order_id = s1.order_id + 1
ORDER BY
1, 2;
结果是:
sample_date sample_time device_id sample_value
20180701 1013 222 11
20180701 1014 222 11
20180701 1021 222 12
20180701 1022 222 12
20180701 1024 111 13
20180701 1025 111 13
20180701 1030 222 14
20180701 1031 222 14
20180701 1032 222 14
20180701 1033 222 14
20180701 1034 222 14
20180701 1035 222 14
20180701 1036 111 13
20180701 1037 111 13
答案 1 :(得分:0)
我认为您想使用lag()
/ lead()
:
select s.*
from (select s.*,
lag(device_id) over (order by sample_date, sample_time) as prev_di,
lead(device_id) over (order by sample_date, sample_time) as next_di,
lag(sample_value) over (order by sample_date, sample_time) as prev_sv,
lead(sample_value) over (order by sample_date, sample_time) as next_sv
from samples s
) s
where (prev_sv = sample_value and prev_di = device_id) or
(next_sv = sample_value and prev_di = device_id);
Here是一个SQL提琴。
如果您特别希望相邻行成为下一个时间单位,则可以使用exists
:
select s.*
from samples s
where exists (select 1
from samples s2
where s2.sample_date = s.sample_date and
s2.sample_time in (s.sample_time - 1, s.sample_time + 1
);
答案 2 :(得分:0)
您可以尝试以下查询:
select date_time,
device_id,
sample_value
from (
select date_time,
device_id,
sample_value,
COUNT(*) over (partition by rnDiff) cnt
from (
select date_time,
device_id,
sample_value,
ROW_NUMBER() over (order by date_time) -
ROW_NUMBER() over (partition by device_id, sample_value order by date_time) rnDiff
from (
select DATETIMEFROMPARTS(sample_date/10000,(sample_date/100)%100,sample_date%100,sample_time/100,sample_time%100,0,0) date_time,
device_id,
sample_value
from samples
) a
) a
) a where cnt > 1
order by date_time
在最内部的查询中,我将您的日期和时间列转换为datetime
格式,因此可以轻松地对其进行排序。然后,我使用row_number()
函数来区分具有相同sample_value
的组,最后在大多数外部查询中,我使用COUNT(*) over (partition by rnDiff)
来计算不同的值。