注意:这是高级SQL ...(恕我直言)
我想要做的总结:
我有一个应用程序,它使用时间戳将数据放入数据库。我想做的是按时间顺序查看这些数据,并确定一个" start"之间的持续时间。并且"停止"点
这是UserRoadData表:
CREATE TABLE [dbo].[UserRoadData](
[Indx] [uniqueidentifier] NOT NULL,
[Indx_User] [uniqueidentifier] NOT NULL,
[Indx_RoadPoint] [uniqueidentifier] NOT NULL,
[TimeHit] [datetime] NOT NULL,
[Indx_UserRoadDataStatus] [int] NOT NULL,
CONSTRAINT [PK_UserRoadData] PRIMARY KEY CLUSTERED
这是RoadPoints表:
CREATE TABLE [dbo].[RoadPoints](
[Indx] [uniqueidentifier] NOT NULL,
[Indx_Road] [uniqueidentifier] NOT NULL,
[Indx_PointType] [int] NOT NULL,
[Sequence] [int] NOT NULL,
[SegmentNumber] [int] NOT NULL,
[GeoLoc] [geography] NOT NULL,
CONSTRAINT [PK_RoadPoints] PRIMARY KEY CLUSTERED
我有以下查询尝试确定持续时间 (注意:此时@NotProcessed = 0)
declare @Now datetime
set @Now = GETDATE();
with CTE_1 as(
SELECT
Indx_RoadPoint
,Indx_Road
,SegmentNumber
,Indx_UserRoadDataStatus
,case WHEN Indx_PointType = @SegmentStart THEN TimeHit
ELSE NULL
END AS InTime
,case WHEN Indx_PointType = @SegmentEnd THEN TimeHit
ELSE NULL
END AS OutTime
,row_number() OVER ( ORDER BY TimeHit ASC ) AS rowNum
FROM
UserRoadData INNER JOIN RoadPoints ON UserRoadData.Indx_RoadPoint = RoadPoints.Indx
where
Indx_User = @Indx_User
and
Indx_Road = @Indx_Road
and
Indx_UserRoadDataStatus = @NotProcessed
),
CTE_2 -- mark those that repeat
AS ( SELECT
t.Indx_RoadPoint
,t.Indx_Road
,t.SegmentNumber
,t.InTime
,t.OutTime
,t.rowNum
,case WHEN ( SELECT Indx_RoadPoint
FROM CTE_1 AS x
WHERE x.rowNum = t.rowNum - 1
) = t.Indx_RoadPoint THEN 1
ELSE 0
END AS mrk
FROM CTE_1 AS t
),
CTE_3 --extract non repeats and group
AS ( SELECT
*
,row_number() OVER ( PARTITION BY Indx_RoadPoint ORDER BY rowNum ASC ) AS rn2
FROM CTE_2
WHERE mrk = 0
)
SELECT
newid() as indx -- this table needs to have the same columns as UserRideData
,t1.Indx_Road
,@Indx_User as Indx_User
,t1.SegmentNumber
,t1.InTime
,t2.OutTime
,datediff(ss, t1.InTime, t2.OutTime) AS Duration
,@Now as DateRecorded
INTO #RoadAndRoadData -- results are put into a temp table...
FROM
CTE_3 AS t1
JOIN CTE_3 AS t2 ON t1.rn2 = t2.rn2
WHERE
t1.Intime IS NOT NULL
AND t2.OutTime IS NOT NULL
GROUP BY
t1.Indx_Road
,t1.SegmentNumber
,t1.InTime
,t2.OutTime
-- insert the temp table data into the real table
INSERT INTO dbo.UserRideData SELECT * FROM #RoadAndRoadData
这是CTE_1语句的整个输出,它包含正确的数据! Guids(indx)已被缩短以保护无辜者: - )
Indx Indx_Road Segment Status InTime OutTime rowNum
87502C53 28992B99 0 0 NULL NULL 1
17BB6691 28992B99 0 0 NULL NULL 2
C40FD10F 28992B99 1 0 2014-06-11 09:09:11.200 NULL 3
7BC5D0A6 28992B99 1 0 NULL NULL 4
97BA8F20 28992B99 1 0 NULL NULL 5
75A3F916 28992B99 1 0 NULL NULL 6
FA2B73E5 28992B99 1 0 NULL NULL 7
D1E16249 28992B99 1 0 NULL NULL 8
BAB45A3C 28992B99 1 0 NULL NULL 9
0EC3D9AD 28992B99 1 0 NULL NULL 10
3A0BAF2A 28992B99 1 0 NULL NULL 11
5B97F78A 28992B99 1 0 NULL 2014-06-11 09:09:20.200 12
E55C20C5 28992B99 2 0 2014-06-11 09:09:21.200 NULL 13
FBC14E4E 28992B99 2 0 NULL NULL 14
5396D1FF 28992B99 2 0 NULL NULL 15
63D5F64B 28992B99 2 0 NULL NULL 16
A463F4FA 28992B99 2 0 NULL 2014-06-11 09:09:25.200 17
F6A528D8 28992B99 0 0 NULL NULL 18
1D73335D 28992B99 0 0 NULL NULL 19
正如您所看到的那样,有一个开始时间,然后是每个唯一段的结束时间,例如:
Segment 1 indx C40FD10F has a non null start time
Segment 1 indx 5B97F78A has a non null stop time --- ( PAIR 1 )
Segment 2 indx E55C20C5 has a non null start time
Segment 2 indx A463F4FA has a non null stop time --- ( PAIR 2 )
这是来自上述输出的真正重要数据,这个问题的确集中在从PAIR 1和PAIR 2中仅将持续时间转换为表格
Indx Indx_Road Segment Status InTime OutTime rowNum
C40FD10F 28992B99 1 0 2014-06-11 09:09:11.200 NULL 3
5B97F78A 28992B99 1 0 NULL 2014-06-11 09:09:20.200 12 --- ( PAIR 1 )
E55C20C5 28992B99 2 0 2014-06-11 09:09:21.200 NULL 13
A463F4FA 28992B99 2 0 NULL 2014-06-11 09:09:25.200 17 --- ( PAIR 2 )
当存储的proc在这里运行时是结果。注意第1段和第2段有两个持续时间......其中一个是误报
indx Indx_Road Indx_User SegmentNumber InTime OutTime Duration DateRecorded
382A9F0D 28992B99 22222222 1 2014-06-11 09:09:11.200 2014-06-11 09:09:20.200 9 2014-06-11 09:09:28.207
BC942182 28992B99 22222222 1 2014-06-11 09:09:11.200 2014-06-11 09:09:25.200 14 2014-06-11 09:09:28.207
548A0340 28992B99 22222222 2 2014-06-11 09:09:21.200 2014-06-11 09:09:20.200 -1 2014-06-11 09:09:28.207
E8322022 28992B99 22222222 2 2014-06-11 09:09:21.200 2014-06-11 09:09:25.200 4 2014-06-11 09:09:28.207
我希望只有一个持续时间,如此(来自上面的PAIR 1和PAIR 2)
indx Indx_Road Indx_User SegmentNumber InTime OutTime Duration DateRecorded
BC942182 28992B99 22222222 1 2014-06-11 09:09:11.200 2014-06-11 09:09:25.200 14 2014-06-11 09:09:28.207
E8322022 28992B99 22222222 2 2014-06-11 09:09:21.200 2014-06-11 09:09:25.200 4 2014-06-11 09:09:28.207
如果您能提供任何见解,我将不胜感激!
...谢谢
答案 0 :(得分:1)
很难说,没有样本开始数据,但我同意@Kevin - 您的查询可以从一开始就简化。特别是,您的查询正在进行大量不需要的工作,而这通常会被抛弃。
此查询做出以下假设:
In
和Out
时间,且每个细分只有一个。 Out
时间始终晚于In
时间。SegmentNumber
对于所有进/出对都是正确的(尽管可以在必要时生成)。您应该可以使用接近以下内容的内容:
SELECT NEWID() AS Indx,
@Indx_Road AS Indx_Road, @Indx_User AS Indx_User,
inTime, outTime, DATEDIFF(second, inTime, outTime) AS duration,
GETDATE() AS dateRecorded
FROM (SELECT Road.SegmentNumber,
MIN(UserRoad.timeHit) AS inTime, MAX(UserRoad.timeHit) AS outTime
FROM UserRoadData UserRoad
JOIN RoadPoints Road
ON Road.Indx = UserRoad.Indx_RoadPoint
AND Road.Indx_Road = @Indx_Road
AND Road.Indx_PointType IN (@SegmentStart, @SegmentEnd)
WHERE UserRoad.Indx_User = @Indx_User
AND UserRoad.Indx_UserRoadDataStatus = @NotProcessed
GROUP BY Road.SegmentNumber) SegmentTime
我怀疑您是否能够使用索引直接回答GROUP BY
,尽管其他条款应该严格限制您的起始设置。您只是丢弃了从CTE_1
返回的大多数行,我甚至都不想包含它们。
我不知道您的情况会对性能产生什么影响,但您应该可以直接将其插入到目标表中,而不会弄乱中间临时表。
请注意,此查询的编写时假设您的输入参数实例很少,按需运行。如果您在批量条目上运行此操作,则查询应该更改。
答案 1 :(得分:0)
SELECT
CTE_2.Indx,
CTE_2.Indx_Road,
CTE_2.Indx_User,
CTE_2.SegmentNumber,
CTE_2.InTime,
CTE_2.OutTime,
DATEDIFF(SECOND, CTE_2.InTime, CTE_2.OutTime) AS Duration,
GETDATE() AS DateRecorded
FROM
(
SELECT
newid() AS Indx,
CTE_1.Indx_Road,
@Indx_User AS Indx_User,
CTE_1.SegmentNumber,
MAX(CTE_1.InTime) AS InTime,
MAX(CTE_1.OutTime) AS OutTime
FROM
(
SELECT
urd.Indx_RoadPoint,
rp.Indx_Road,
rp.SegmentNumber,
urd.Indx_UserRoadDataStatus,
(CASE WHEN Indx_PointType = @SegmentStart THEN TimeHit ELSE NULL END) AS InTime,
(CASE WHEN Indx_PointType = @SegmentEnd THEN TimeHit ELSE NULL END) AS OutTime,
(row_number() OVER ( ORDER BY TimeHit ASC )) AS rowNum
FROM UserRoadData urd
INNER JOIN RoadPoints rp
ON urd.Indx_RoadPoint = rp.Indx
AND (rp.Indx_PointType = @SegmentStart OR rp.Indx_PointType = @SegmentEnd)
WHERE urd.Indx_User = @Indx_User
and rp.Indx_Road = @Indx_Road
and urd.Indx_UserRoadDataStatus = @NotProcessed
) AS CTE_1
GROUP BY CTE_1.Indx_Road, CTE_1.SegmentNumber
) AS CTE_2