修改

Question

我有一个存储bluetooh检测信息的表。例如：

MACaddress         | DetectorID | PollingIntervalStart     | PollingIntervalEnd
00:00:00:00:00:01  |    3       | 2012-03-26 16:51:09.000  | 2012-03-26 16:51:19.000
00:00:00:00:00:01  |    3       | 2012-03-26 16:51:24.000  | 2012-03-26 16:51:28.000
00:00:00:00:00:01  |    3       | 2012-03-26 16:51:35.000  | 2012-03-26 16:51:49.000
00:00:00:00:00:01  |    3       | 2012-03-26 16:51:55.000  | 2012-03-26 16:52:09.000
00:00:00:00:32:11  |    3       | 2012-03-26 17:00:43.000  | 2012-03-26 17:01:19.000
00:00:00:00:20:F1  |    1       | 2012-03-26 17:02:52.000  | 2012-03-26 16:53:02.000
...

00:00:00:00:00:01  |    3       | 2012-03-26 19:21:19.000  | 2012-03-26 19:21:48.000
00:00:00:00:00:01  |    3       | 2012-03-26 19:21:59.000  | 2012-03-26 19:22:51.000
00:00:00:00:00:01  |    3       | 2012-03-26 19:22:19.000  | 2012-03-26 19:22:31.000
00:00:00:00:20:F1  |    1       | 2012-03-26 20:23:49.000  | 2012-03-26 19:50:30.000

detectorID是轮询设备的蓝牙检测器的ID。如您所见，有时设备可以在检测器的轮询半径中停留，因此我们可以获得同一设备的检测集群。我想要做的是对群集进行分组并进行该群集的第一次检测（意思是min(DetectionTime)）（比如我们将群集定义为在三分钟内多次轮询的同一设备）。请注意，检测器的轮询间隔长度不是恒定的。例如，对于群集

00:00:00:00:00:01  |    3       | 2012-03-26 16:51:09.000  | 2012-03-26 16:51:19.000 -- take this record
00:00:00:00:00:01  |    3       | 2012-03-26 16:51:24.000  | 2012-03-26 16:51:28.000
00:00:00:00:00:01  |    3       | 2012-03-26 16:51:35.000  | 2012-03-26 16:51:49.000
00:00:00:00:00:01  |    3       | 2012-03-26 16:51:55.000  | 2012-03-26 16:52:09.000

我想只获得第一张唱片。如上所述分组后，表格应如下所示：

MACaddress         | DetectorID | PollingIntervalStart     | PollingIntervalEnd
00:00:00:00:00:01  |    3       | 2012-03-26 16:51:09.000  | 2012-03-26 16:51:19.000
00:00:00:00:32:11  |    3       | 2012-03-26 17:00:43.000  | 2012-03-26 17:01:19.000
00:00:00:00:20:F1  |    1       | 2012-03-26 17:02:52.000  | 2012-03-26 16:53:02.000
...

00:00:00:00:00:01  |    3       | 2012-03-26 19:21:19.000  | 2012-03-26 19:21:48.000
00:00:00:00:20:F1  |    1       | 2012-03-26 20:23:49.000  | 2012-03-26 19:50:30.000

我尝试使用group by, ROW_NUMBER, RANK, DENSE_RANK而我似乎无法弄明白。我尝试使用计数表来制作时间间隔并按时间间隔加入，但这不起作用。任何帮助表示赞赏。感谢。

修改

“clumps”的意思是，如果在短时间内多次检测到同一设备，则认为它是一团。我将间隔定义为3分钟。这个间隔长度是任意的，它可以是任意数分钟，但我只选择3分钟。因此，如果在3:00:22和3:00:34以及3:01:44检测到mac地址，则所有三个检测都被认为是一个丛。如果在3:00:22和3:07:32检测到它不是一团。

它必须是第一次发现丛。如果你有最后一次检测到丛的代码，你也可以发布它。也许，我可以尝试使用ROW_NUMBER和降序来获得所需的输出。

编辑2

我更改了Aaron的代码，以便群集长度不再是常量。代码现在只检查群集分离。因此，任何超过3分钟的检测都不被视为群集。这种新的集群定义使代码更容易。

Answer 1

鉴于此示例数据（我已经更正了开始时间＆gt;结束时间的行，这似乎不正确）：

DECLARE @d TABLE
(
  MACaddress VARCHAR(32), 
  DetectorID INT, 
  PollingIntervalStart DATETIME2(0), 
  PollingIntervalEnd DATETIME2(0)
);

INSERT @d VALUES
('00:00:00:00:00:01',3,'2012-03-26 16:51:09.000','2012-03-26 16:51:19.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:24.000','2012-03-26 16:51:28.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:35.000','2012-03-26 16:51:49.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:55.000','2012-03-26 16:52:09.000'),
('00:00:00:00:32:11',3,'2012-03-26 17:00:43.000','2012-03-26 17:01:19.000'),
('00:00:00:00:20:F1',1,'2012-03-26 17:02:52.000','2012-03-26 16:53:02.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:21:19.000','2012-03-26 19:21:48.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:21:59.000','2012-03-26 19:22:51.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:22:19.000','2012-03-26 19:22:31.000'),
('00:00:00:00:20:F1',1,'2012-03-26 19:49:49.000','2012-03-26 19:50:30.000');

这个想法得到了丛的最后一排。正如我所说，我认为这当然是可能的，但我必须继续前进。这在SQL Server 2012中肯定会更容易，它增加了一系列排名功能。

;WITH x AS 
(
  SELECT *, rn = ROW_NUMBER() OVER 
    (PARTITION BY MacAddress, DetectorID ORDER BY PollingIntervalStart)
  FROM @d
)
SELECT * FROM x 
WHERE NOT EXISTS 
(
  SELECT 1 FROM x AS x2 
  WHERE x2.MACaddress = x.MacAddress
  AND x2.DetectorID = x2.DetectorID
  AND x2.rn = x.rn + 1
  AND x2.PollingIntervalStart <= DATEADD(MINUTE, 3, x.PollingIntervalStart)
)
ORDER BY x.PollingIntervalStart;

结果：

MACaddress         DetectorID  PollingIntervalStart  PollingIntervalEnd   rn
-----------------  ----------  --------------------  -------------------  --
00:00:00:00:00:01  3           2012-03-26 16:51:55   2012-03-26 16:52:09  4
00:00:00:00:32:11  3           2012-03-26 17:00:43   2012-03-26 17:01:19  1
00:00:00:00:20:F1  1           2012-03-26 17:02:52   2012-03-26 16:53:02  1
00:00:00:00:00:01  3           2012-03-26 19:22:19   2012-03-26 19:22:31  7
00:00:00:00:20:F1  1           2012-03-26 19:49:49   2012-03-26 19:50:30  2

另一个想法是获得您想要的结果，但使用游标。我个人认为有这样的情况，光标是完全可以接受的（also see this discussion on running totals pre-2012，并记住the caveat that you should use proper cursor options），但其他人甚至拒绝看它们。这是否实用取决于数据的大小;你应该测试一下。

DECLARE @newTable TABLE
(
  MACaddress VARCHAR(32), 
  DetectorID INT, 
  PollingIntervalStart DATETIME2(0), 
  PollingIntervalEnd DATETIME2(0)
);

DECLARE @PreviousTime DATETIME2(0) = NULL, @ma VARCHAR(32), @de INT, 
  @st DATETIME2(0), @et DATETIME2(0), @rn INT;

DECLARE c CURSOR LOCAL FAST_FORWARD FOR 
  SELECT *, rn = ROW_NUMBER() OVER 
    (PARTITION BY MacAddress, DetectorID ORDER BY PollingIntervalStart)
    FROM @d ORDER BY MacAddress, rn;

OPEN c;

FETCH c INTO @ma, @de, @st, @et, @rn;

WHILE @@FETCH_STATUS = 0
BEGIN
  IF @rn = 1 OR (@rn > 1 AND DATEDIFF(MINUTE, @PreviousTime, @st) > 3)
  BEGIN
    INSERT @newTable SELECT @ma, @de, @st, @et;
  END

  SELECT @PreviousTime = @st;

  FETCH c INTO @ma, @de, @st, @et, @rn;
END

SELECT * FROM @newTable ORDER BY PollingIntervalStart;

CLOSE c; DEALLOCATE c;

结果：

MACaddress         DetectorID  PollingIntervalStart  PollingIntervalEnd
-----------------  ----------  --------------------  -------------------
00:00:00:00:00:01  3           2012-03-26 16:51:09   2012-03-26 16:51:19
00:00:00:00:32:11  3           2012-03-26 17:00:43   2012-03-26 17:01:19
00:00:00:00:20:F1  1           2012-03-26 17:02:52   2012-03-26 16:53:02
00:00:00:00:00:01  3           2012-03-26 19:21:19   2012-03-26 19:21:48
00:00:00:00:20:F1  1           2012-03-26 19:49:49   2012-03-26 19:50:30

Answer 2

我通过稍微修改Aaron Bertrand's answer找到答案。

设置表格：

DECLARE @d TABLE
(
  MACaddress VARCHAR(32), 
  DetectorID INT, 
  PollingIntervalStart DATETIME2(0), 
  PollingIntervalEnd DATETIME2(0)
);

INSERT @d VALUES
('00:00:00:00:00:01',3,'2012-03-26 16:51:09.000','2012-03-26 16:51:19.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:24.000','2012-03-26 16:51:28.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:35.000','2012-03-26 16:51:49.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:55.000','2012-03-26 16:52:09.000'),
('00:00:00:00:32:11',3,'2012-03-26 17:00:43.000','2012-03-26 17:01:19.000'),
('00:00:00:00:20:F1',1,'2012-03-26 17:02:52.000','2012-03-26 16:53:02.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:21:19.000','2012-03-26 19:21:48.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:21:59.000','2012-03-26 19:22:51.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:22:19.000','2012-03-26 19:22:31.000'),
('00:00:00:00:20:F1',1,'2012-03-26 19:49:49.000','2012-03-26 19:50:30.000');

我对Aaron的代码进行了两次修改。我按降序排列了子查询。在WHERE NOT EXISTS条件下，我将DATEADD支票替换为DATEDIFF(MINUTE, x2.PollingIntervalStart, x.PollingIntervalStart) < 3。

;WITH x AS 
(
    SELECT 
    *, 
    ROW_NUMBER() OVER 
        (PARTITION BY MacAddress, DetectorID ORDER BY PollingIntervalStart DESC) AS RN
    FROM @d
)
select * from x
WHERE NOT EXISTS 
(
  SELECT 1 FROM x AS x2 
  WHERE x2.MACaddress = x.MacAddress
  AND x2.DetectorID = x2.DetectorID
  AND x2.rn = x.rn + 1
  -- x2.PollingIntervalStart is always less than x.PollingIntervalStart becasue of x2.rn = x.rn + 1 condition
  -- this works because the cte query is ordered in descending order
  AND DATEDIFF(MINUTE, x2.PollingIntervalStart, x.PollingIntervalStart) < 3 
)
ORDER BY x.PollingIntervalStart;

谢谢Aaron。

按时间将记录分组到块中 - SQL SERVER 2008

修改

编辑2

2 个答案: