我有一个SQL Server 2014表,其中包含数百万个gps坐标,每个坐标都在特定时间。但是,注册之间的间隔不固定,从1秒到几小时不等。我只想每4分钟保留一次测量值,因此必须删除其他记录。
我在T-SQL中尝试了遍历每条记录的WHILE循环,在循环内部有一个带有双重CROSS APPLY的select语句,如果它位于相隔不超过4分钟的2个其他记录之间,则仅返回记录。然而,这种策略太慢了。
这可以通过基于集合的解决方案完成吗?或者有没有办法加快这个查询? (下面的测试查询只是打印,尚未删除)
SELECT * INTO #myTemp FROM gps ORDER BY TimePoint asc
declare @Id Uniqueidentifier
declare @d1 varchar(19)
declare @d2 varchar(19)
declare @d3 varchar(19)
While EXISTS (select * from #myTemp )
BEGIN
select top 1 @Id = ID FROM #myTemp order by TimePoint asc
SELECT
@d1 = convert(varchar(19), a.justbefore, 121),
@d2 = convert(varchar(19), b.tijdstip, 121),
@d3 = convert(varchar(19), c.justafter, 121)
FROM Gps B CROSS APPLY
(
SELECT top 1 TimePoint as justbefore
FROM Gps
WHERE (B.TimePoint > TimePoint ) AND (B.Id = @Id )
ORDER by TimePoint desc
) A
CROSS APPLY (
SELECT top 1 TimePoint as justafter
FROM Gps
WHERE (Datediff(n,A.justbefore,TimePoint ) between -4 AND 0)
AND (B.TimePoint < TimePoint )
ORDER by TimePoint asc
) C
print 'ID=' + Cast(@id as varchar(50))
+ ' / d1=' + @d1 + ' / d2=' + @d2 + ' / d3=' + @d3
DELETE #myTemp where Id = @id
END
-
Sample data:
Id TimePoint Lat Lon
1 20170725 13:05:27 12,256 24,123
2 20170725 13:10:27 12,254 24,120
3 20170725 13:10:29 12,253 24,125
4 20170725 13:11:55 12,259 24,127
5 20170725 13:11:59 12,255 24,123
6 20170725 13:14:28 12,254 24,126
7 20170725 13:16:52 12,259 24,121
8 20170725 13:20:53 12,257 24,125
在这种情况下,应删除记录3,4,5。 记录7应保持在7和8之间的差距超过4分钟。
答案 0 :(得分:0)
看数字......它看起来像1&amp; 2住(相隔5分钟)... 3,4和&amp; 5应该去... 6次停留(从2开始4分钟)... 7应该去(距离6只有2分钟)和8次停留(距离6只6分钟)...
If this is correct, the following will do what you're looking for...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
Id INT NOT NULL PRIMARY KEY CLUSTERED,
TimePoint DATETIME2(0) NOT NULL,
Lat DECIMAL(9,3),
Lon DECIMAL(9,3)
);
INSERT #TestData (Id, TimePoint, Lat, Lon) VALUES
(1, '20170725 13:05:27', 12.256, 24.123),
(2, '20170725 13:10:27', 12.254, 24.120),
(3, '20170725 13:10:29', 12.253, 24.125),
(4, '20170725 13:11:55', 12.259, 24.127),
(5, '20170725 13:11:59', 12.255, 24.123),
(6, '20170725 13:14:28', 12.254, 24.126),
(7, '20170725 13:16:52', 12.259, 24.121),
(8, '20170725 13:20:53', 12.257, 24.125);
-- SELECT * FROM #TestData td;
--================================================================================
WITH
cte_AddLag AS (
SELECT
td.Id, td.TimePoint, td.Lat, td.Lon,
MinFromPrev = DATEDIFF(mi, LAG(td.TimePoint, 1) OVER (ORDER BY td.TimePoint), td.TimePoint)
FROM
#TestData td
),
cte_TimeGroup AS (
SELECT
*,
TimeGroup = ISNULL(SUM(al.MinFromPrev) OVER (ORDER BY al.TimePoint ROWS UNBOUNDED PRECEDING) / 4, 0)
FROM
cte_AddLag al
)
SELECT TOP 1 WITH TIES
tg.Id,
tg.TimePoint,
tg.Lat,
tg.Lon
FROM
cte_TimeGroup tg
ORDER BY
ROW_NUMBER() OVER (PARTITION BY tg.TimeGroup ORDER BY tg.TimePoint);
结果...
Id TimePoint Lat Lon
----------- --------------------------- --------------------------------------- ---------------------------------------
1 2017-07-25 13:05:27 12.256 24.123
2 2017-07-25 13:10:27 12.254 24.120
6 2017-07-25 13:14:28 12.254 24.126
8 2017-07-25 13:20:53 12.257 24.125
HTH,Jason