我有一个包含以下列的数据集:
DriverId DateStamp IsDriving WasDriving DistanceSincePrev SecondsSincePrev
1 11/10/2018 08:00 0 0 0 12
1 11/10/2018 08:01 1 0 10 60
1 11/10/2018 08:01 1 1 100 54
1 11/10/2018 08:02 1 1 14 32
1 11/10/2018 08:03 1 1 33 60
1 11/10/2018 08:04 0 1 10 59
1 11/10/2018 08:04 0 0 0 60
1 11/10/2018 08:05 1 0 0 60
1 11/10/2018 08:06 1 1 500 43
1 11/10/2018 08:06 0 1 300 32
1 11/10/2018 08:07 0 0 0 60
1 11/10/2018 08:08 0 0 0 12
1 11/10/2018 08:09 0 0 10 60
1 11/10/2018 08:10 0 0 100 54
1 11/10/2018 08:11 0 0 14 32
1 11/10/2018 08:12 0 0 33 60
1 11/10/2018 08:13 0 0 10 59
1 11/10/2018 08:14 0 0 0 60
1 11/10/2018 08:15 1 0 0 60
1 11/10/2018 08:16 1 1 500 43
1 11/10/2018 08:16 1 1 300 32
1 11/10/2018 08:17 1 1 0 60
1 11/10/2018 08:18 1 1 500 43
1 11/10/2018 08:19 1 1 300 32
1 11/10/2018 08:19 1 1 0 60
1 11/10/2018 08:20 1 1 500 43
1 11/10/2018 08:21 1 1 300 32
1 11/10/2018 08:22 1 1 0 60
1 11/10/2018 08:23 1 1 500 43
1 11/10/2018 08:24 1 1 300 32
1 11/10/2018 08:24 0 1 0 60
1 11/10/2018 08:25 0 0 0 60
如您所见,这些是一个人驾驶的时间戳。我想将这些时间戳归类为RIDES,我的意思是该人在不关闭引擎的情况下驾驶的部分。在此数据集中,我可以使用“ IsDriving”和“ WasDriving”列进行此操作。但是我在编写查询时遇到问题。
我对算法如何工作有2个想法
1)更理想,可能更困难:查询将检测IsDriving为1且WasDriving为0的记录并将其计为旅程的开始。然后它将检测IsDriving为0和WasDriving为1的记录,并在那里结束旅程。
2)有点启发式,但已经足够了:查询将简单地汇总IsDriving和WasDriving都连续设置为1的记录,并将其计为一次旅程。
不幸的是,我无法将这种算法应用于SQL。
理想情况下,我的输出如下所示:
DriverId StartOfRide DistanceOfRide LengthOfRide
1 11/10/2018 08:00 1400 221
1 11/10/2018 08:30 5900 329
1 11/10/2018 12:00 21400 3600
答案 0 :(得分:1)
也许会这样做,删除/添加您不需要的列:
create table #tmp (DriverId int , DateStamp datetime, IsDriving int , WasDriving int, DistanceSincePrev float, SecondsSincePrev float)
insert into #tmp values
(1, ' 11/10/2018 08:00', 0 , 0 , 0 , 12),
(1, '11/10/2018 08:01', 1 , 0 , 10 , 60),
(1, '11/10/2018 08:01' ,1 , 1 , 100 , 54),
(1, '11/10/2018 08:02' ,1 , 1 , 14 , 32),
(1, '11/10/2018 08:03' ,1 , 1 , 33, 60),
(1, '11/10/2018 08:04' ,0 , 1 , 10 , 59),
(1, '11/10/2018 08:04' ,0 , 0 , 0 , 60),
(1, '11/10/2018 08:05' ,1 , 0 , 0 , 60),
(1, '11/10/2018 08:06' ,1 , 1 , 500 , 43),
(1, '11/10/2018 08:06' ,0 , 1 , 300 , 32),
(1, '11/10/2018 08:07' ,0 , 0 , 0 , 60),
(1, '11/10/2018 08:08' ,0 , 0 , 0 , 12),
(1, '11/10/2018 08:09' ,0 , 0 , 10 , 60),
(1, '11/10/2018 08:10' ,0 , 0 , 100, 54),
(1, '11/10/2018 08:11' ,0 , 0 , 14 , 32),
(1, ' 11/10/2018 08:12' ,0 , 0 , 33 , 60),
(1, '11/10/2018 08:13' ,0 , 0 , 10 , 59)
select * from
(
select DateStamp as RideStart,DriverID, Grp,(SUM(DistanceSincePrev) over (partition by grp)) as DistanceofRide,
(SUM(SecondsSincePrev ) over (partition by grp)) as LengthofRide,
ROW_NUMBER() over (PARTITION by driverid,grp order by datestamp) r
from
(
SELECT
*,
Grp = ROW_NUMBER() OVER (PARTITION BY driverID ORDER BY DateStamp) -
ROW_NUMBER() OVER (PARTITION BY driverID,IsDriving ORDER BY DateStamp)
FROM #tmp
) s
) x
where r = 1
答案 1 :(得分:1)
您需要分配组,然后进行汇总。在这种情况下,您可以将一个组定义为0
中的IsDriving
个值的数量,直至每个记录。然后聚合:
select driverid, min(datestamp) as startofride,
sum(distance) as distance,
sum(seconds) as seconds
from (select t.*,
sum(1 - isdriving) over (partition by driverid order by datestamp) as grp
from t
) t
group by driverid, grp