我有一个存储bluetooh检测信息的表。例如:
MACaddress | DetectorID | PollingIntervalStart | PollingIntervalEnd
00:00:00:00:00:01 | 3 | 2012-03-26 16:51:09.000 | 2012-03-26 16:51:19.000
00:00:00:00:00:01 | 3 | 2012-03-26 16:51:24.000 | 2012-03-26 16:51:28.000
00:00:00:00:00:01 | 3 | 2012-03-26 16:51:35.000 | 2012-03-26 16:51:49.000
00:00:00:00:00:01 | 3 | 2012-03-26 16:51:55.000 | 2012-03-26 16:52:09.000
00:00:00:00:32:11 | 3 | 2012-03-26 17:00:43.000 | 2012-03-26 17:01:19.000
00:00:00:00:20:F1 | 1 | 2012-03-26 17:02:52.000 | 2012-03-26 16:53:02.000
...
00:00:00:00:00:01 | 3 | 2012-03-26 19:21:19.000 | 2012-03-26 19:21:48.000
00:00:00:00:00:01 | 3 | 2012-03-26 19:21:59.000 | 2012-03-26 19:22:51.000
00:00:00:00:00:01 | 3 | 2012-03-26 19:22:19.000 | 2012-03-26 19:22:31.000
00:00:00:00:20:F1 | 1 | 2012-03-26 20:23:49.000 | 2012-03-26 19:50:30.000
detectorID是轮询设备的蓝牙检测器的ID。如您所见,有时设备可以在检测器的轮询半径中停留,因此我们可以获得同一设备的检测集群。我想要做的是对群集进行分组并进行该群集的第一次检测(意思是min(DetectionTime)
)(比如我们将群集定义为在三分钟内多次轮询的同一设备)。请注意,检测器的轮询间隔长度不是恒定的。例如,对于群集
00:00:00:00:00:01 | 3 | 2012-03-26 16:51:09.000 | 2012-03-26 16:51:19.000 -- take this record
00:00:00:00:00:01 | 3 | 2012-03-26 16:51:24.000 | 2012-03-26 16:51:28.000
00:00:00:00:00:01 | 3 | 2012-03-26 16:51:35.000 | 2012-03-26 16:51:49.000
00:00:00:00:00:01 | 3 | 2012-03-26 16:51:55.000 | 2012-03-26 16:52:09.000
我想只获得第一张唱片。如上所述分组后,表格应如下所示:
MACaddress | DetectorID | PollingIntervalStart | PollingIntervalEnd
00:00:00:00:00:01 | 3 | 2012-03-26 16:51:09.000 | 2012-03-26 16:51:19.000
00:00:00:00:32:11 | 3 | 2012-03-26 17:00:43.000 | 2012-03-26 17:01:19.000
00:00:00:00:20:F1 | 1 | 2012-03-26 17:02:52.000 | 2012-03-26 16:53:02.000
...
00:00:00:00:00:01 | 3 | 2012-03-26 19:21:19.000 | 2012-03-26 19:21:48.000
00:00:00:00:20:F1 | 1 | 2012-03-26 20:23:49.000 | 2012-03-26 19:50:30.000
我尝试使用group by, ROW_NUMBER, RANK, DENSE_RANK
而我似乎无法弄明白。我尝试使用计数表来制作时间间隔并按时间间隔加入,但这不起作用。任何帮助表示赞赏。感谢。
“clumps”的意思是,如果在短时间内多次检测到同一设备,则认为它是一团。我将间隔定义为3分钟。这个间隔长度是任意的,它可以是任意数分钟,但我只选择3分钟。因此,如果在3:00:22和3:00:34以及3:01:44检测到mac地址,则所有三个检测都被认为是一个丛。如果在3:00:22和3:07:32检测到它不是一团。
它必须是第一次发现丛。如果你有最后一次检测到丛的代码,你也可以发布它。也许,我可以尝试使用ROW_NUMBER和降序来获得所需的输出。
我更改了Aaron的代码,以便群集长度不再是常量。代码现在只检查群集分离。因此,任何超过3分钟的检测都不被视为群集。这种新的集群定义使代码更容易。
答案 0 :(得分:2)
鉴于此示例数据(我已经更正了开始时间>结束时间的行,这似乎不正确):
DECLARE @d TABLE
(
MACaddress VARCHAR(32),
DetectorID INT,
PollingIntervalStart DATETIME2(0),
PollingIntervalEnd DATETIME2(0)
);
INSERT @d VALUES
('00:00:00:00:00:01',3,'2012-03-26 16:51:09.000','2012-03-26 16:51:19.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:24.000','2012-03-26 16:51:28.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:35.000','2012-03-26 16:51:49.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:55.000','2012-03-26 16:52:09.000'),
('00:00:00:00:32:11',3,'2012-03-26 17:00:43.000','2012-03-26 17:01:19.000'),
('00:00:00:00:20:F1',1,'2012-03-26 17:02:52.000','2012-03-26 16:53:02.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:21:19.000','2012-03-26 19:21:48.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:21:59.000','2012-03-26 19:22:51.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:22:19.000','2012-03-26 19:22:31.000'),
('00:00:00:00:20:F1',1,'2012-03-26 19:49:49.000','2012-03-26 19:50:30.000');
这个想法得到了丛的最后一排。正如我所说,我认为这当然是可能的,但我必须继续前进。这在SQL Server 2012中肯定会更容易,它增加了一系列排名功能。
;WITH x AS
(
SELECT *, rn = ROW_NUMBER() OVER
(PARTITION BY MacAddress, DetectorID ORDER BY PollingIntervalStart)
FROM @d
)
SELECT * FROM x
WHERE NOT EXISTS
(
SELECT 1 FROM x AS x2
WHERE x2.MACaddress = x.MacAddress
AND x2.DetectorID = x2.DetectorID
AND x2.rn = x.rn + 1
AND x2.PollingIntervalStart <= DATEADD(MINUTE, 3, x.PollingIntervalStart)
)
ORDER BY x.PollingIntervalStart;
结果:
MACaddress DetectorID PollingIntervalStart PollingIntervalEnd rn
----------------- ---------- -------------------- ------------------- --
00:00:00:00:00:01 3 2012-03-26 16:51:55 2012-03-26 16:52:09 4
00:00:00:00:32:11 3 2012-03-26 17:00:43 2012-03-26 17:01:19 1
00:00:00:00:20:F1 1 2012-03-26 17:02:52 2012-03-26 16:53:02 1
00:00:00:00:00:01 3 2012-03-26 19:22:19 2012-03-26 19:22:31 7
00:00:00:00:20:F1 1 2012-03-26 19:49:49 2012-03-26 19:50:30 2
另一个想法是获得您想要的结果,但使用游标。我个人认为有这样的情况,光标是完全可以接受的(also see this discussion on running totals pre-2012,并记住the caveat that you should use proper cursor options),但其他人甚至拒绝看它们。这是否实用取决于数据的大小;你应该测试一下。
DECLARE @newTable TABLE
(
MACaddress VARCHAR(32),
DetectorID INT,
PollingIntervalStart DATETIME2(0),
PollingIntervalEnd DATETIME2(0)
);
DECLARE @PreviousTime DATETIME2(0) = NULL, @ma VARCHAR(32), @de INT,
@st DATETIME2(0), @et DATETIME2(0), @rn INT;
DECLARE c CURSOR LOCAL FAST_FORWARD FOR
SELECT *, rn = ROW_NUMBER() OVER
(PARTITION BY MacAddress, DetectorID ORDER BY PollingIntervalStart)
FROM @d ORDER BY MacAddress, rn;
OPEN c;
FETCH c INTO @ma, @de, @st, @et, @rn;
WHILE @@FETCH_STATUS = 0
BEGIN
IF @rn = 1 OR (@rn > 1 AND DATEDIFF(MINUTE, @PreviousTime, @st) > 3)
BEGIN
INSERT @newTable SELECT @ma, @de, @st, @et;
END
SELECT @PreviousTime = @st;
FETCH c INTO @ma, @de, @st, @et, @rn;
END
SELECT * FROM @newTable ORDER BY PollingIntervalStart;
CLOSE c; DEALLOCATE c;
结果:
MACaddress DetectorID PollingIntervalStart PollingIntervalEnd
----------------- ---------- -------------------- -------------------
00:00:00:00:00:01 3 2012-03-26 16:51:09 2012-03-26 16:51:19
00:00:00:00:32:11 3 2012-03-26 17:00:43 2012-03-26 17:01:19
00:00:00:00:20:F1 1 2012-03-26 17:02:52 2012-03-26 16:53:02
00:00:00:00:00:01 3 2012-03-26 19:21:19 2012-03-26 19:21:48
00:00:00:00:20:F1 1 2012-03-26 19:49:49 2012-03-26 19:50:30
答案 1 :(得分:0)
我通过稍微修改Aaron Bertrand's answer找到答案。
设置表格:
DECLARE @d TABLE
(
MACaddress VARCHAR(32),
DetectorID INT,
PollingIntervalStart DATETIME2(0),
PollingIntervalEnd DATETIME2(0)
);
INSERT @d VALUES
('00:00:00:00:00:01',3,'2012-03-26 16:51:09.000','2012-03-26 16:51:19.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:24.000','2012-03-26 16:51:28.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:35.000','2012-03-26 16:51:49.000'),
('00:00:00:00:00:01',3,'2012-03-26 16:51:55.000','2012-03-26 16:52:09.000'),
('00:00:00:00:32:11',3,'2012-03-26 17:00:43.000','2012-03-26 17:01:19.000'),
('00:00:00:00:20:F1',1,'2012-03-26 17:02:52.000','2012-03-26 16:53:02.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:21:19.000','2012-03-26 19:21:48.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:21:59.000','2012-03-26 19:22:51.000'),
('00:00:00:00:00:01',3,'2012-03-26 19:22:19.000','2012-03-26 19:22:31.000'),
('00:00:00:00:20:F1',1,'2012-03-26 19:49:49.000','2012-03-26 19:50:30.000');
我对Aaron的代码进行了两次修改。我按降序排列了子查询。在WHERE NOT EXISTS
条件下,我将DATEADD
支票替换为DATEDIFF(MINUTE, x2.PollingIntervalStart, x.PollingIntervalStart) < 3
。
;WITH x AS
(
SELECT
*,
ROW_NUMBER() OVER
(PARTITION BY MacAddress, DetectorID ORDER BY PollingIntervalStart DESC) AS RN
FROM @d
)
select * from x
WHERE NOT EXISTS
(
SELECT 1 FROM x AS x2
WHERE x2.MACaddress = x.MacAddress
AND x2.DetectorID = x2.DetectorID
AND x2.rn = x.rn + 1
-- x2.PollingIntervalStart is always less than x.PollingIntervalStart becasue of x2.rn = x.rn + 1 condition
-- this works because the cte query is ordered in descending order
AND DATEDIFF(MINUTE, x2.PollingIntervalStart, x.PollingIntervalStart) < 3
)
ORDER BY x.PollingIntervalStart;
谢谢Aaron。