我在单个表中有以下数据(MS SQL Server 2012):
cinderellaID statusName timestamp
------------ ------------------------- -----------------------
10459 Waiting 2013-03-16 12:03:17.000
10459 Paired 2013-03-16 12:29:50.000
10459 Shopping 2013-03-16 12:29:22.233
10459 Checked Out 2013-03-16 14:01:24.000
10461 Alterations 1988-01-02 01:47:07.000
10461 Checked Out 2013-03-16 14:42:25.000
10461 Paired 2013-03-16 12:29:31.000
10461 Shopping 2013-03-16 12:29:01.437
10461 Waiting 2013-03-16 11:52:18.000
10462 Waiting 2013-03-16 12:19:35.000
10462 Shopping 2013-03-16 12:59:01.197
10462 Paired 2013-03-16 12:59:28.000
10462 Checked Out 2013-03-16 14:35:44.000
10463 Checked Out 2013-03-16 12:22:20.000
10463 Waiting 2013-03-16 10:44:14.000
10463 Paired 2013-03-16 11:00:37.000
10463 Shopping 2013-03-16 11:00:23.063
10464 Waiting 2013-03-16 10:44:03.000
10464 Paired 2013-03-16 10:59:32.000
10464 Shopping 2013-03-16 10:59:02.560
10464 Alterations 1988-01-02 00:44:02.000
10464 Checked Out 2013-03-16 13:18:21.000
10465 Checked Out 2013-03-16 11:54:34.000
10465 Waiting 2013-03-16 09:44:13.000
10465 Paired 2013-03-16 10:08:05.000
10465 Shopping 2013-03-16 10:10:58.323
10466 Waiting 2013-03-16 12:13:51.000
10466 Shopping 2013-03-16 12:46:56.207
10466 Paired 2013-03-16 12:46:43.000
10467 Shopping 2013-03-16 10:08:06.553
10467 Paired 2013-03-16 10:04:49.000
10467 Waiting 2013-03-16 09:41:03.000
<much more data ...>
这里的数据是由cinderellaID订购的,但这只是为了让这个问题更容易理解。
这些是显示某人(由cinderellaID标识)进入每个状态的交易。例如,在第1行中,灰姑娘10459进入了等待&#34;等待&#34;阶段在2013-03-16 12:03:17.000。数据中始终存在流(或应该是)。等待总是过渡到配对,配对购物,购物到结帐或更改。如果它去购物 - &gt;改变,那么它将改变 - &gt;检查过了。我知道不是所有的数据都被捕获了,但这对我来说没问题。
我想要的是一种计算每个阶段所花费的平均时间的方法。例如,每个人花费多长时间&#34; Waiting&#34;在他们搬到&#34;配对&#34;?之前每个人花了多长时间&#34;配对&#34;在去&#34;购物&#34 ;?之前所以我的输出最好看起来像(我制作了数据):
status avgTimeSpent
------------- -----------------
Waiting 1:00:04
Paired 0:20:22
Shopping 1:30:11
...
我熟悉分组和我称之为&#34;普通的SQL&#34;像这样,但是我不熟悉如何进行行操作,我认为我需要做的就是为了解决这个问题。有什么帮助吗?
答案 0 :(得分:1)
这样的事情应该有效:
SELECT
t1.cinderellaID,
t1.statusName,
AVG(DATEDIFF(second, t1.timestamp, t2.timestamp)) As AvgTime
FROM YourTable As t1
INNER JOIN YourTable As t2
ON t1.cinderellaID = t2.cinderellaID
AND t1.timestamp < t2.timestamp
AND NOT EXISTS(Select * From YourTable As t3
Where t3.cinderellaID = t1.cinderellaID
And t3.timestamp < t2.timestamp
And t3.timestamp > t1.timestamp)
GROUP BY t1.cinderellaID, t1.statusName
此查询应适用于任何版本的SQL。有一个更有效的查询使用ROW_NUMBER() OVER(..)
函数,但不是所有类型的SQL支持。
我看到你有SQL-Server-2012标签,它支持这个功能,所以这里是:
;WITH cte As
(
SELECT *,
ROW_NUMBER() OVER(
PARTITION BY cinderellaID, statusName
ORDER BY timestamp) As rowNum
FROM YourTable
)
SELECT
t1.cinderellaID,
t1.statusName,
AVG(DATEDIFF(second, t1.timestamp, t2.timestamp)) As AvgTime
FROM cte As t1
INNER JOIN cte As t2
ON t1.cinderellaID = t2.cinderellaID
AND t1.timestamp < t2.timestamp
AND t1.rowNum = t2.rowNum-1
GROUP BY t1.cinderellaID, t1.statusName
答案 1 :(得分:1)
您可以使用lead()
执行所需操作。获取所需信息的基本查询是:
select t.*,
lead(statusname) over (partition by cinderellaID order by timestamp) as next_statusname,
lead(timestamp) over (partition by cinderellaID order by timestamp) as next_timestamp
from singletable t;
然后得到平均值:
select statusname, next_statusname,
avg(datediff(second, timestamp, next_timestamp)) as avg_seconds
from (select t.*,
lead(statusname) over (partition by cinderellaID order by timestamp) as next_statusname,
lead(timestamp) over (partition by cinderellaID order by timestamp) as next_timestamp
from singletable t
) t
group by statusname, next_statusname;