我有一张桌子,如下所示:
我需要按Cat和Timestamp对数据进行分组,并按组进行计数。小组被定义为5分钟的动态时间窗,这意味着它可以跨越不同的时间。
查询结果应提供以下内容:
以黄色查看第一个表组。这些组应被检测并计为一个,而未突出显示的组也应计为一个
现在我在Stackoverflow上阅读了许多解决方案,以下是我尝试过的相关解决方案:
我将非常感谢对此的任何帮助
有关ascii表中的原始数据,请参见下文
原始数据
+---------------------+----------+
| Timestamp | Category |
+---------------------+----------+
| 2018-10-01 04:06:12 | Cat1 |
| 2018-10-01 05:07:18 | Cat1 |
| 2018-10-01 05:07:19 | Cat1 |
| 2018-10-01 05:07:20 | Cat1 |
| 2018-10-01 06:09:29 | Cat1 |
| 2018-10-01 07:24:12 | Cat2 |
| 2018-10-01 07:30:43 | Cat2 |
| 2018-10-01 07:59:13 | Cat2 |
| 2018-10-01 08:02:15 | Cat2 |
| 2018-10-01 10:09:25 | Cat2 |
| 2018-10-01 11:13:42 | Cat2 |
+---------------------+----------+
答案 0 :(得分:2)
这是一种实现方式
第一步,根据先前的时间戳记值是否在5分钟之内将记录分类。 如果是,则为它分配一个row_number。
这将是如下获得您的值
+---------------------+----------+-----------+
| timestamp1 | category | grps_of_5 |
+---------------------+----------+-----------+
| 01/10/2018 05:06:12 | Cat1 | 1 |
| 01/10/2018 05:07:18 | Cat1 | |
| 01/10/2018 05:07:19 | Cat1 | |
| 01/10/2018 05:07:20 | Cat1 | |
| 01/10/2018 06:09:29 | Cat1 | 5 |
| 01/10/2018 07:24:12 | Cat2 | 1 |
| 01/10/2018 07:30:43 | Cat2 | 2 |
| 01/10/2018 07:59:13 | Cat2 | 3 |
| 01/10/2018 08:02:15 | Cat2 | |
| 01/10/2018 10:09:25 | Cat2 | 5 |
| 01/10/2018 11:13:42 | Cat2 | 6 |
+---------------------+----------+-----------+
After that i "copy" the values to fill up the nulls in groups using
max(grps_of_5) over(partition by category order by timestamp1)
This is done in the curated_data block and will look like this
+---------------------+----------+-----------+---------+
| timestamp1 | category | grps_of_5 | max_val |
+---------------------+----------+-----------+---------+
| 01/10/2018 04:06:12 | Cat1 | 1 | 1 |
| 01/10/2018 05:07:18 | Cat1 | 2 | 2 |
| 01/10/2018 05:07:19 | Cat1 | | 2 |
| 01/10/2018 05:07:20 | Cat1 | | 2 |
| 01/10/2018 06:09:29 | Cat1 | 5 | 5 |
| 01/10/2018 07:24:12 | Cat2 | 1 | 1 |
| 01/10/2018 07:30:43 | Cat2 | 2 | 2 |
| 01/10/2018 07:59:13 | Cat2 | 3 | 3 |
| 01/10/2018 08:02:15 | Cat2 | | 3 |
| 01/10/2018 10:09:25 | Cat2 | 5 | 5 |
| 01/10/2018 11:13:42 | Cat2 | 6 | 6 |
+---------------------+----------+-----------+---------+
After that i am counting the distinct max_val which will tell count all 5 minute intervals as a single group and others seperately.
with raw_data
as(select timestamp1
,category
,case when datediff(mi,lag(timestamp1) over(partition by category order by timestamp1),timestamp1) >5
or lag(timestamp1) over(partition by category order by timestamp1) is null
then row_number() over(partition by category order by timestamp1)
end as grps_of_5
from t
)
,curated_data
as (select max(grps_of_5) over(partition by category order by timestamp1) as max_val
,x.*
from raw_data x
)
select category,count(distinct max_val) as cnt
from curated_data
group by category
+----------+------+
| category | cnt2 |
+----------+------+
| Cat1 | 3 |
| Cat2 | 5 |
+----------+------+
编辑版本
演示链接
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=556e0ec16bb040b96b637e3da3e8178b
答案 1 :(得分:1)
这很容易通过LAG
完成:
DECLARE @t TABLE (timestamp DATETIME, category VARCHAR(100));
INSERT INTO @t VALUES
('2018-10-01 04:06:12', 'CAT1'),
('2018-10-01 05:07:18', 'CAT1'),
('2018-10-01 05:07:19', 'CAT1'),
('2018-10-01 05:07:20', 'CAT1'),
('2018-10-01 06:09:29', 'CAT1'),
('2018-10-01 07:24:12', 'CAT2'),
('2018-10-01 07:30:43', 'CAT2'),
('2018-10-01 07:59:13', 'CAT2'),
('2018-10-01 08:02:15', 'CAT2'),
('2018-10-01 10:09:25', 'CAT2'),
('2018-10-01 11:13:42', 'CAT2');
WITH cte1 AS (
SELECT timestamp, category, CASE WHEN LAG(timestamp) OVER (PARTITION BY category ORDER BY timestamp) > DATEADD(MINUTE, -5, timestamp) THEN 0 ELSE 1 END AS chg
FROM @t
)
SELECT category, COUNT(CASE WHEN chg = 1 THEN 1 END)
FROM cte1
GROUP BY category
要了解其工作原理,请重点关注chg
列的计算方式,并查看cte的结果:
timestamp category chg
2018-10-01 04:06:12.000 CAT1 1
2018-10-01 05:07:18.000 CAT1 1
2018-10-01 05:07:19.000 CAT1 0
2018-10-01 05:07:20.000 CAT1 0
2018-10-01 06:09:29.000 CAT1 1
2018-10-01 07:24:12.000 CAT2 1
2018-10-01 07:30:43.000 CAT2 1
2018-10-01 07:59:13.000 CAT2 1
2018-10-01 08:02:15.000 CAT2 0
2018-10-01 10:09:25.000 CAT2 1
2018-10-01 11:13:42.000 CAT2 1
答案 2 :(得分:0)
请尝试以下代码:
SELECT * INTO #temp
FROM(
SELECT '2018-10-01 05:06:12' AS Timestamp , 'Cat1' AS Category
UNION ALL
SELECT '2018-10-01 05:07:18' AS Timestamp , 'Cat1' AS Category
UNION ALL
SELECT '2018-10-01 05:07:19' AS Timestamp , 'Cat1' AS Category
UNION ALL
SELECT '2018-10-01 05:07:20' AS Timestamp , 'Cat1' AS Category
UNION ALL
SELECT '2018-10-01 06:09:29' AS Timestamp , 'Cat1' AS Category
UNION ALL
SELECT '2018-10-01 07:24:12' AS Timestamp , 'Cat2' AS Category
UNION ALL
SELECT '2018-10-01 07:30:43' AS Timestamp , 'Cat2' AS Category
UNION ALL
SELECT '2018-10-01 07:59:13' AS Timestamp , 'Cat2' AS Category
UNION ALL
SELECT '2018-10-01 08:02:15' AS Timestamp , 'Cat2' AS Category
UNION ALL
SELECT '2018-10-01 10:09:25' AS Timestamp , 'Cat2' AS Category
UNION ALL
SELECT '2018-10-01 11:13:42' AS Timestamp , 'Cat2' AS Category
) AS T
SELECT Category AS [Group], COUNT(CONVERT(DATE,Timestamp)) AS [Count] FROM #temp GROUP By Category