按时间将记录分组到块中 - SQL SERVER 2008

时间:2013-11-13 16:01:19

标签: sql-server-2008 tsql grouping group-by

我有一个存储bluetooh检测信息的表。例如:

MACaddress         | DetectorID | PollingIntervalStart     | PollingIntervalEnd
00:00:00:00:00:01  |    3       | 2012-03-26 16:51:09.000  | 2012-03-26 16:51:19.000
00:00:00:00:00:01  |    3       | 2012-03-26 16:51:24.000  | 2012-03-26 16:51:28.000
00:00:00:00:00:01  |    3       | 2012-03-26 16:51:35.000  | 2012-03-26 16:51:49.000
00:00:00:00:00:01  |    3       | 2012-03-26 16:51:55.000  | 2012-03-26 16:52:09.000
00:00:00:00:32:11  |    3       | 2012-03-26 17:00:43.000  | 2012-03-26 17:01:19.000
00:00:00:00:20:F1  |    1       | 2012-03-26 17:02:52.000  | 2012-03-26 16:53:02.000
...

00:00:00:00:00:01  |    3       | 2012-03-26 19:21:19.000  | 2012-03-26 19:21:48.000
00:00:00:00:00:01  |    3       | 2012-03-26 19:21:59.000  | 2012-03-26 19:22:51.000
00:00:00:00:00:01  |    3       | 2012-03-26 19:22:19.000  | 2012-03-26 19:22:31.000
00:00:00:00:20:F1  |    1       | 2012-03-26 20:23:49.000  | 2012-03-26 19:50:30.000

detectorID是轮询设备的蓝牙检测器的ID。如您所见,有时设备可以在检测器的轮询半径中停留,因此我们可以获得同一设备的检测集群。我想要做的是对群集进行分组并进行该群集的第一次检测(意思是min(DetectionTime))(比如我们将群集定义为在三分钟内多次轮询的同一设备)。请注意,检测器的轮询间隔长度不是恒定的。例如,对于群集

00:00:00:00:00:01  |    3       | 2012-03-26 16:51:09.000  | 2012-03-26 16:51:19.000 -- take this record
00:00:00:00:00:01  |    3       | 2012-03-26 16:51:24.000  | 2012-03-26 16:51:28.000
00:00:00:00:00:01  |    3       | 2012-03-26 16:51:35.000  | 2012-03-26 16:51:49.000
00:00:00:00:00:01  |    3       | 2012-03-26 16:51:55.000  | 2012-03-26 16:52:09.000

我想只获得第一张唱片。如上所述分组后,表格应如下所示:

MACaddress         | DetectorID | PollingIntervalStart     | PollingIntervalEnd
00:00:00:00:00:01  |    3       | 2012-03-26 16:51:09.000  | 2012-03-26 16:51:19.000
00:00:00:00:32:11  |    3       | 2012-03-26 17:00:43.000  | 2012-03-26 17:01:19.000
00:00:00:00:20:F1  |    1       | 2012-03-26 17:02:52.000  | 2012-03-26 16:53:02.000
...

00:00:00:00:00:01  |    3       | 2012-03-26 19:21:19.000  | 2012-03-26 19:21:48.000
00:00:00:00:20:F1  |    1       | 2012-03-26 20:23:49.000  | 2012-03-26 19:50:30.000

我尝试使用group by, ROW_NUMBER, RANK, DENSE_RANK而我似乎无法弄明白。我尝试使用计数表来制作时间间隔并按时间间隔加入,但这不起作用。任何帮助表示赞赏。感谢。

修改

“clumps”的意思是,如果在短时间内多次检测到同一设备,则认为它是一团。我将间隔定义为3分钟。这个间隔长度是任意的,它可以是任意数分钟,但我只选择3分钟。因此,如果在3:00:22和3:00:34以及3:01:44检测到mac地址,则所有三个检测都被认为是一个丛。如果在3:00:22和3:07:32检测到它不是一团。

它必须是第一次发现丛。如果你有最后一次检测到丛的代码,你也可以发布它。也许,我可以尝试使用ROW_NUMBER和降序来获得所需的输出。

编辑2

我更改了Aaron的代码,以便群集长度不再是常量。代码现在只检查群集分离。因此,任何超过3分钟的检测都不被视为群集。这种新的集群定义使代码更容易。

2 个答案:

答案 0 :(得分:2)

鉴于此示例数据(我已经更正了开始时间>结束时间的行,这似乎不正确):

DECLARE @d TABLE
(
  MACaddress VARCHAR(32), 
  DetectorID INT, 
  PollingIntervalStart DATETIME2(0), 
  PollingIntervalEnd DATETIME2(0)
);

INSERT @d VALUES
('00:00:00:00:00:01',3,'2012-03-26 16:51:09.000','2012-03-26 16:51:19.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:24.000','2012-03-26 16:51:28.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:35.000','2012-03-26 16:51:49.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:55.000','2012-03-26 16:52:09.000'),
('00:00:00:00:32:11',3,'2012-03-26 17:00:43.000','2012-03-26 17:01:19.000'),
('00:00:00:00:20:F1',1,'2012-03-26 17:02:52.000','2012-03-26 16:53:02.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:21:19.000','2012-03-26 19:21:48.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:21:59.000','2012-03-26 19:22:51.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:22:19.000','2012-03-26 19:22:31.000'),
('00:00:00:00:20:F1',1,'2012-03-26 19:49:49.000','2012-03-26 19:50:30.000');

这个想法得到了丛的最后一排。正如我所说,我认为这当然是可能的,但我必须继续前进。这在SQL Server 2012中肯定会更容易,它增加了一系列排名功能。

;WITH x AS 
(
  SELECT *, rn = ROW_NUMBER() OVER 
    (PARTITION BY MacAddress, DetectorID ORDER BY PollingIntervalStart)
  FROM @d
)
SELECT * FROM x 
WHERE NOT EXISTS 
(
  SELECT 1 FROM x AS x2 
  WHERE x2.MACaddress = x.MacAddress
  AND x2.DetectorID = x2.DetectorID
  AND x2.rn = x.rn + 1
  AND x2.PollingIntervalStart <= DATEADD(MINUTE, 3, x.PollingIntervalStart)
)
ORDER BY x.PollingIntervalStart;

结果:

MACaddress         DetectorID  PollingIntervalStart  PollingIntervalEnd   rn
-----------------  ----------  --------------------  -------------------  --
00:00:00:00:00:01  3           2012-03-26 16:51:55   2012-03-26 16:52:09  4
00:00:00:00:32:11  3           2012-03-26 17:00:43   2012-03-26 17:01:19  1
00:00:00:00:20:F1  1           2012-03-26 17:02:52   2012-03-26 16:53:02  1
00:00:00:00:00:01  3           2012-03-26 19:22:19   2012-03-26 19:22:31  7
00:00:00:00:20:F1  1           2012-03-26 19:49:49   2012-03-26 19:50:30  2

另一个想法是获得您想要的结果,但使用游标。我个人认为有这样的情况,光标是完全可以接受的(also see this discussion on running totals pre-2012,并记住the caveat that you should use proper cursor options),但其他人甚至拒绝看它们。这是否实用取决于数据的大小;你应该测试一下。

DECLARE @newTable TABLE
(
  MACaddress VARCHAR(32), 
  DetectorID INT, 
  PollingIntervalStart DATETIME2(0), 
  PollingIntervalEnd DATETIME2(0)
);

DECLARE @PreviousTime DATETIME2(0) = NULL, @ma VARCHAR(32), @de INT, 
  @st DATETIME2(0), @et DATETIME2(0), @rn INT;

DECLARE c CURSOR LOCAL FAST_FORWARD FOR 
  SELECT *, rn = ROW_NUMBER() OVER 
    (PARTITION BY MacAddress, DetectorID ORDER BY PollingIntervalStart)
    FROM @d ORDER BY MacAddress, rn;

OPEN c;

FETCH c INTO @ma, @de, @st, @et, @rn;

WHILE @@FETCH_STATUS = 0
BEGIN
  IF @rn = 1 OR (@rn > 1 AND DATEDIFF(MINUTE, @PreviousTime, @st) > 3)
  BEGIN
    INSERT @newTable SELECT @ma, @de, @st, @et;
  END

  SELECT @PreviousTime = @st;

  FETCH c INTO @ma, @de, @st, @et, @rn;
END

SELECT * FROM @newTable ORDER BY PollingIntervalStart;

CLOSE c; DEALLOCATE c;

结果:

MACaddress         DetectorID  PollingIntervalStart  PollingIntervalEnd
-----------------  ----------  --------------------  -------------------
00:00:00:00:00:01  3           2012-03-26 16:51:09   2012-03-26 16:51:19
00:00:00:00:32:11  3           2012-03-26 17:00:43   2012-03-26 17:01:19
00:00:00:00:20:F1  1           2012-03-26 17:02:52   2012-03-26 16:53:02
00:00:00:00:00:01  3           2012-03-26 19:21:19   2012-03-26 19:21:48
00:00:00:00:20:F1  1           2012-03-26 19:49:49   2012-03-26 19:50:30

答案 1 :(得分:0)

我通过稍微修改Aaron Bertrand's answer找到答案。

设置表格:

DECLARE @d TABLE
(
  MACaddress VARCHAR(32), 
  DetectorID INT, 
  PollingIntervalStart DATETIME2(0), 
  PollingIntervalEnd DATETIME2(0)
);

INSERT @d VALUES
('00:00:00:00:00:01',3,'2012-03-26 16:51:09.000','2012-03-26 16:51:19.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:24.000','2012-03-26 16:51:28.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:35.000','2012-03-26 16:51:49.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:55.000','2012-03-26 16:52:09.000'),
('00:00:00:00:32:11',3,'2012-03-26 17:00:43.000','2012-03-26 17:01:19.000'),
('00:00:00:00:20:F1',1,'2012-03-26 17:02:52.000','2012-03-26 16:53:02.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:21:19.000','2012-03-26 19:21:48.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:21:59.000','2012-03-26 19:22:51.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:22:19.000','2012-03-26 19:22:31.000'),
('00:00:00:00:20:F1',1,'2012-03-26 19:49:49.000','2012-03-26 19:50:30.000');

我对Aaron的代码进行了两次修改。我按降序排列了子查询。在WHERE NOT EXISTS条件下,我将DATEADD支票替换为DATEDIFF(MINUTE, x2.PollingIntervalStart, x.PollingIntervalStart) < 3

;WITH x AS 
(
    SELECT 
    *, 
    ROW_NUMBER() OVER 
        (PARTITION BY MacAddress, DetectorID ORDER BY PollingIntervalStart DESC) AS RN
    FROM @d
)
select * from x
WHERE NOT EXISTS 
(
  SELECT 1 FROM x AS x2 
  WHERE x2.MACaddress = x.MacAddress
  AND x2.DetectorID = x2.DetectorID
  AND x2.rn = x.rn + 1
  -- x2.PollingIntervalStart is always less than x.PollingIntervalStart becasue of x2.rn = x.rn + 1 condition
  -- this works because the cte query is ordered in descending order
  AND DATEDIFF(MINUTE, x2.PollingIntervalStart, x.PollingIntervalStart) < 3 
)
ORDER BY x.PollingIntervalStart;

谢谢Aaron。