SQL持续时间计算

时间:2010-12-07 13:22:09

标签: sql-server tsql sql-server-2008 azure-sql-database

我有一张特定时间的历史公交车位置表,每秒记录一次。架构如下所示:

BusID        int         not null,
BreadcrumbID int         not null identity (1, 1),
BusStopID    int         null,
Timestamp    datetime    not null

我想根据历史旅行制定一个巴士站时间表。如果公共汽车BusStopID对应停靠点,则“停在”停靠点,如果BusStopID为空,则不会“停止”。

我需要生成每次停靠时公交车的平均时间。所以基本上,我需要做以下事情:

  • 确定公交车停靠的时间 - 一个简单的where条款可以解决问题
  • 确定公交车停靠的平均时间。为了我的目的,我将一个离散的“停止时间”定义为正负10分钟的窗口;如果一辆公共汽车从10:04 - 10:08停止,另一天在10:06 - 10:08停止,第三天在10:14 - 10:18停靠,那将是同一站,但如果它停止在10:45 - 10:48,这将是一个不同的停止发生。
  • 过滤掉“噪音” - 即停止只发生过几次但从未再次
  • 的次数

我完全不知道如何完成第二和第三个子弹。请帮忙!

4 个答案:

答案 0 :(得分:2)

This post 我刚才看到可以帮到你。 (Sql Server Central)

答案 1 :(得分:2)

我曾多次做过类似的事情。基本上,基于复杂排序中的分离进行分组。关于这个问题,我使用的方法的基础如下:

  1. 建立一个包含所有感兴趣时间范围的表格。
  2. 查找每组感兴趣的时间范围的开始时间。
  3. 查找每组感兴趣的时间范围的结束时间。
  4. 将开始和结束时间加入时间范围列表和分组。
  5. 或者,更详细地说:(这些步骤中的每一步都可能是一个大型CTE的一部分,但为了便于阅读,我将其分解为临时表...)

    步骤1:找到所有感兴趣的时间范围列表(我使用的方法类似于@Brad链接的方法)。 注意:正如@Manfred Sorg指出的那样,这假设总线数据中没有“丢失秒”。如果时间戳中断,则此代码会将单个范围解释为两个(或更多)不同的范围。

    ;with stopSeconds as (
      select BusID, BusStopID, TimeStamp,
             [date] = cast(datediff(dd,0,TimeStamp) as datetime),
             [grp] = dateadd(ss, -row_number() over(partition by BusID order by TimeStamp), TimeStamp)
      from #test
      where BusStopID is not null
    )
    select BusID, BusStopID, date,
           [sTime] = dateadd(ss,datediff(ss,date,min(TimeStamp)), 0),
           [eTime] = dateadd(ss,datediff(ss,date,max(TimeStamp)), 0),
           [secondsOfStop] = datediff(ss, min(TimeStamp), max(Timestamp)),
           [sOrd] = row_number() over(partition by BusID, BusStopID order by datediff(ss,date,min(TimeStamp))),
           [eOrd] = row_number() over(partition by BusID, BusStopID order by datediff(ss,date,max(TimeStamp)))
    into #ranges
    from stopSeconds
    group by BusID, BusStopID, date, grp
    

    第2步:找到每次停留的最早时间

    select this.BusID, this.BusStopID, this.sTime minSTime,
           [stopOrder] = row_number() over(partition by this.BusID, this.BusStopID order by this.sTime)
    into #starts
    from #ranges this
      left join #ranges prev on this.BusID = prev.BusID
                            and this.BusStopID = prev.BusStopID
                            and this.sOrd = prev.sOrd+1
                            and this.sTime between dateadd(mi,-10,prev.sTime) and dateadd(mi,10,prev.sTime)
    where prev.BusID is null
    

    第3步:查找每个停靠点的最新时间

    select this.BusID, this.BusStopID, this.eTime maxETime,
           [stopOrder] = row_number() over(partition by this.BusID, this.BusStopID order by this.eTime)
    into #ends
    from #ranges this
      left join #ranges next on this.BusID = next.BusID
                            and this.BusStopID = next.BusStopID
                            and this.eOrd = next.eOrd-1
                            and this.eTime between dateadd(mi,-10,next.eTime) and dateadd(mi,10,next.eTime)
    where next.BusID is null
    

    第4步:将所有内容加入

    select r.BusID, r.BusStopID,
           [avgLengthOfStop] = avg(datediff(ss,r.sTime,r.eTime)),
           [earliestStop] = min(r.sTime),
           [latestDepart] = max(r.eTime)
    from #starts s
      join #ends e on s.BusID=e.BusID
                  and s.BusStopID=e.BusStopID
                  and s.stopOrder=e.stopOrder
      join #ranges r on r.BusID=s.BusID
                    and r.BusStopID=s.BusStopID
                    and r.sTime between s.minSTime and e.maxETime
                    and r.eTime between s.minSTime and e.maxETime
    group by r.BusID, r.BusStopID, s.stopOrder
    having count(distinct r.date) > 1 --filters out the "noise"
    

    最后,要完整,整理一下:

    drop table #ends
    drop table #starts
    drop table #ranges
    

答案 2 :(得分:0)

新答案......

试试这个:

DECLARE @stopWindowMinutes INT
SET @stopWindowMinutes = 10

--
;
WITH    test_data
          AS ( SELECT   1 [BusStopId]
                       ,'2010-01-01 10:00:04' [Timestamp]
               UNION SELECT   1,'2010-01-01 10:00:05'
               UNION SELECT   1,'2010-01-01 10:00:06'
               UNION SELECT   1,'2010-01-01 10:00:07'
               UNION SELECT   1,'2010-01-01 10:00:08'
               UNION SELECT   1,'2010-01-02 10:00:06'
               UNION SELECT   1,'2010-01-02 10:00:07'
               UNION SELECT   1,'2010-01-02 10:00:08'
               UNION SELECT   2,'2010-01-01 10:00:06'
               UNION SELECT   2,'2010-01-01 10:00:07'
               UNION SELECT   2,'2010-01-01 10:00:08'
               UNION SELECT   2,'2010-01-01 10:00:09'
               UNION SELECT   2,'2010-01-01 10:00:10'
               UNION SELECT   2,'2010-01-01 10:00:09'
               UNION SELECT   2,'2010-01-01 10:00:10'
               UNION SELECT   2,'2010-01-01 10:00:11'
               UNION SELECT   1,'2010-01-02 10:33:43'
               UNION SELECT   1,'2010-01-02 10:33:44'
               UNION SELECT   1,'2010-01-02 10:33:45'
               UNION SELECT   1,'2010-01-02 10:33:46'
             )
    SELECT DISTINCT
            [BusStopId]
           ,[AvgStop]
    FROM    ( SELECT    [a].[BusStopId]
                       ,( SELECT    MIN([b].[Timestamp])
                          FROM      [test_data] b
                          WHERE     [a].[BusStopId] = [b].[BusStopId]
                                    AND CONVERT(VARCHAR(10), [a].[Timestamp], 120) = CONVERT(VARCHAR(10), [b].[Timestamp], 120)
                                    AND [b].[Timestamp] BETWEEN DATEADD(SECOND, -@stopWindowMinutes * 60,
                                                                        [a].[Timestamp])
                                                        AND     DATEADD(SECOND, @stopWindowMinutes * 60, [a].[Timestamp]) -- w/i X minutes

                        ) [MinStop]
                       ,( SELECT    MAX([b].[Timestamp])
                          FROM      [test_data] b
                          WHERE     [a].[BusStopId] = [b].[BusStopId]
                                    AND CONVERT(VARCHAR(10), [a].[Timestamp], 120) = CONVERT(VARCHAR(10), [b].[Timestamp], 120)
                                    AND [b].[Timestamp] BETWEEN DATEADD(SECOND, -@stopWindowMinutes * 60,
                                                                        [a].[Timestamp])
                                                        AND     DATEADD(SECOND, @stopWindowMinutes * 60, [a].[Timestamp]) -- w/i X minutes

                        ) [MaxStop]
                       ,( SELECT    DATEADD(second,
                                            AVG(DATEDIFF(second, CONVERT(VARCHAR(10), [b].[Timestamp], 120),
                                                         [b].[Timestamp])),
                                            CONVERT(VARCHAR(10), MIN([b].[Timestamp]), 120))
                          FROM      [test_data] b
                          WHERE     [a].[BusStopId] = [b].[BusStopId]
                                    AND CONVERT(VARCHAR(10), [a].[Timestamp], 120) = CONVERT(VARCHAR(10), [b].[Timestamp], 120)
                                    AND [b].[Timestamp] BETWEEN DATEADD(SECOND, -@stopWindowMinutes * 60,
                                                                        [a].[Timestamp])
                                                        AND     DATEADD(SECOND, @stopWindowMinutes * 60, [a].[Timestamp]) -- w/i X minutes

                        ) [AvgStop]
              FROM      [test_data] a
              WHERE     CONVERT(VARCHAR(10), [Timestamp], 120) = CONVERT(VARCHAR(10), [Timestamp], 120)
              GROUP BY  [a].[BusStopId]
                       ,[a].[Timestamp]
            ) subset1

答案 3 :(得分:0)

通常情况下,这些问题通过将它们分成一小块可以更容易解决和维护:

-- Split into Date and minutes-since-midnight
WITH observed(dates,arrival,busstop,bus) AS (
    SELECT
        CONVERT(CHAR(8), TimeStamp, 112),
        DATEPART(HOUR,TimeStamp) * 60 + DATEPART(MINUTE,TimeStamp),
        busstopid,
        busid
    FROM
        History
),
-- Identify times at stop subsequent to arrival at that stop
atstop(dates,stoptime,busstop,bus) AS (
    SELECT
        a.dates,
        a.arrival,
        a.busstop,
        a.bus
    FROM
        observed a 
    WHERE
        EXISTS (
            SELECT 
                *
            FROM
                observed b
            WHERE
                a.dates = b.dates AND
                a.busstop = b.busstop AND
                a.bus = b.bus AND
                a.arrival - b.arrival BETWEEN 1 AND 10
        )
),
-- Isolate actual arrivals at stops, excluding waiting at stops
dailyhalts(dates,arrival,busstop,bus) AS (
    SELECT
        a.dates,a.arrival,a.busstop,a.bus
    FROM
        observed a 
    WHERE
        arrival NOT IN (
            SELECT
                stoptime
            FROM 
                atstop b 
            WHERE
                a.dates = b.dates AND
                a.busstop = b.busstop AND
                a.bus = b.bus 
    )
),
-- Merge arrivals across all dates
timetable(busstop,bus,arrival) AS (
    SELECT
        a.busstop, a.bus, a.arrival
    FROM
        dailyhalts a 
    WHERE
        NOT EXISTS (
            SELECT  
                *
            FROM
                dailyhalts h 
            WHERE
                a.busstop = h.busstop AND
                a.bus = h.bus AND
                a.arrival - h.arrival BETWEEN 1 AND 10
        )
    GROUP BY
        a.busstop, a.bus, a.arrival
)
-- Print timetable for a given day
SELECT
    a.busstop, a.bus, a.arrival, DATEADD(minute,AVG(b.arrival),'2010/01/01')
FROM
    timetable a INNER JOIN
    observed b ON
        a.busstop = b.busstop AND
        a.bus = b.bus AND
        b.arrival BETWEEN a.arrival AND a.arrival + 10
GROUP BY
    a.busstop, a.bus, a.arrival

输入:

ID  BusID   BusStopID   TimeStamp
1   1   1   2010-01-01 10:00:00.000
2   1   1   2010-01-01 10:01:00.000
3   1   1   2010-01-01 10:02:00.000
4   1   2   2010-01-01 11:00:00.000
5   1   3   2010-01-01 12:00:00.000
6   1   3   2010-01-01 12:01:00.000
7   1   3   2010-01-01 12:02:00.000
8   1   3   2010-01-01 12:03:00.000
9   1   1   2010-01-02 11:00:00.000
10  1   1   2010-01-02 11:03:00.000
11  1   1   2010-01-02 11:07:00.000
12  1   2   2010-01-02 12:00:00.000
13  1   3   2010-01-02 13:00:00.000
14  1   3   2010-01-02 13:01:00.000
15  1   1   2010-01-03 10:03:00.000
16  1   1   2010-01-03 10:05:00.000

输出:

busstop bus arrival (No column name)
1   1   600 2010-01-01 10:02:00.000
1   1   660 2010-01-01 11:03:00.000
2   1   660 2010-01-01 11:00:00.000
2   1   720 2010-01-01 12:00:00.000
3   1   720 2010-01-01 12:01:00.000
3   1   780 2010-01-01 13:00:00.000