基于附加注释/列的行之间的SQL时间戳差异

时间:2014-10-20 14:32:25

标签: sql sql-server-2012

我在单个表中有以下数据(MS SQL Server 2012):

cinderellaID statusName                timestamp
------------ ------------------------- -----------------------
10459        Waiting                   2013-03-16 12:03:17.000
10459        Paired                    2013-03-16 12:29:50.000
10459        Shopping                  2013-03-16 12:29:22.233
10459        Checked Out               2013-03-16 14:01:24.000
10461        Alterations               1988-01-02 01:47:07.000
10461        Checked Out               2013-03-16 14:42:25.000
10461        Paired                    2013-03-16 12:29:31.000
10461        Shopping                  2013-03-16 12:29:01.437
10461        Waiting                   2013-03-16 11:52:18.000
10462        Waiting                   2013-03-16 12:19:35.000
10462        Shopping                  2013-03-16 12:59:01.197
10462        Paired                    2013-03-16 12:59:28.000
10462        Checked Out               2013-03-16 14:35:44.000
10463        Checked Out               2013-03-16 12:22:20.000
10463        Waiting                   2013-03-16 10:44:14.000
10463        Paired                    2013-03-16 11:00:37.000
10463        Shopping                  2013-03-16 11:00:23.063
10464        Waiting                   2013-03-16 10:44:03.000
10464        Paired                    2013-03-16 10:59:32.000
10464        Shopping                  2013-03-16 10:59:02.560
10464        Alterations               1988-01-02 00:44:02.000
10464        Checked Out               2013-03-16 13:18:21.000
10465        Checked Out               2013-03-16 11:54:34.000
10465        Waiting                   2013-03-16 09:44:13.000
10465        Paired                    2013-03-16 10:08:05.000
10465        Shopping                  2013-03-16 10:10:58.323
10466        Waiting                   2013-03-16 12:13:51.000
10466        Shopping                  2013-03-16 12:46:56.207
10466        Paired                    2013-03-16 12:46:43.000
10467        Shopping                  2013-03-16 10:08:06.553
10467        Paired                    2013-03-16 10:04:49.000
10467        Waiting                   2013-03-16 09:41:03.000
<much more data ...>

这里的数据是由cinderellaID订购的,但这只是为了让这个问题更容易理解。

这些是显示某人(由cinderellaID标识)进入每个状态的交易。例如,在第1行中,灰姑娘10459进入了等待&#34;等待&#34;阶段在2013-03-16 12:03:17.000。数据中始终存在流(或应该是)。等待总是过渡到配对,配对购物,购物到结帐或更改。如果它去购物 - &gt;改变,那么它将改变 - &gt;检查过了。我知道不是所有的数据都被捕获了,但这对我来说没问题。

我想要的是一种计算每个阶段所花费的平均时间的方法。例如,每个人花费多长时间&#34; Waiting&#34;在他们搬到&#34;配对&#34;?之前每个人花了多长时间&#34;配对&#34;在去&#34;购物&#34 ;?之前所以我的输出最好看起来像(我制作了数据):

status        avgTimeSpent
------------- -----------------
Waiting       1:00:04
Paired        0:20:22
Shopping      1:30:11
...

我熟悉分组和我称之为&#34;普通的SQL&#34;像这样,但是我不熟悉如何进行行操作,我认为我需要做的就是为了解决这个问题。有什么帮助吗?

2 个答案:

答案 0 :(得分:1)

这样的事情应该有效:

SELECT
    t1.cinderellaID,
    t1.statusName,
    AVG(DATEDIFF(second, t1.timestamp, t2.timestamp)) As AvgTime
FROM        YourTable As t1
INNER JOIN  YourTable As t2
    ON  t1.cinderellaID = t2.cinderellaID
    AND t1.timestamp < t2.timestamp
    AND NOT EXISTS(Select * From YourTable As t3
                   Where t3.cinderellaID = t1.cinderellaID
                     And t3.timestamp < t2.timestamp
                     And t3.timestamp > t1.timestamp)
GROUP BY t1.cinderellaID, t1.statusName

此查询应适用于任何版本的SQL。有一个更有效的查询使用ROW_NUMBER() OVER(..)函数,但不是所有类型的SQL支持。

我看到你有SQL-Server-2012标签,它支持这个功能,所以这里是:

;WITH cte As
(
    SELECT *,
        ROW_NUMBER() OVER(
                        PARTITION BY cinderellaID, statusName 
                        ORDER BY timestamp) As rowNum
    FROM YourTable
)
SELECT
    t1.cinderellaID,
    t1.statusName,
    AVG(DATEDIFF(second, t1.timestamp, t2.timestamp)) As AvgTime
FROM        cte As t1
INNER JOIN  cte As t2
    ON  t1.cinderellaID = t2.cinderellaID
    AND t1.timestamp < t2.timestamp
    AND t1.rowNum = t2.rowNum-1
GROUP BY t1.cinderellaID, t1.statusName

答案 1 :(得分:1)

您可以使用lead()执行所需操作。获取所需信息的基本查询是:

select t.*,
       lead(statusname) over (partition by cinderellaID order by timestamp) as next_statusname,
       lead(timestamp) over (partition by cinderellaID order by timestamp) as next_timestamp
from singletable t;

然后得到平均值:

select statusname, next_statusname,
       avg(datediff(second, timestamp, next_timestamp)) as avg_seconds
from (select t.*,
             lead(statusname) over (partition by cinderellaID order by timestamp) as next_statusname,
             lead(timestamp) over (partition by cinderellaID order by timestamp) as next_timestamp
      from singletable t
     ) t
group by statusname, next_statusname;