我在会话中有一个操作表,每个步骤之间有一段持续时间(毫秒):
+-----------------------------------------------------------------------+
| | userid | sessionid | action sequence | action | milliseconds | |
| +--------+-----------+-----------------+-------------+--------------+ |
| | 1 | 1 | 1 | event start | 0 | |
| | 1 | 1 | 2 | other | 188114 | |
| | 1 | 1 | 3 | event end | 248641 | |
| | 1 | 1 | 4 | other | 398215 | |
| | 1 | 1 | 5 | event start | 488284 | |
| | 1 | 1 | 6 | other | 528445 | |
| | 1 | 1 | 7 | other | 572711 | |
| | 1 | 1 | 8 | event end | 598123 | |
| | 1 | 2 | 1 | event start | 0 | |
| | 1 | 2 | 2 | event end | 54363 | |
| | 2 | 1 | 1 | other | 0 | |
| | 2 | 1 | 2 | other | 2345 | |
| | 2 | 1 | 1 | other | 75647 | |
| | 3 | 1 | 2 | other | 0 | |
| | 3 | 1 | 3 | event start | 34678 | |
| | 3 | 1 | 4 | other | 46784 | |
| | 3 | 1 | 5 | other | 78905 | |
| | 4 | 1 | 1 | event start | 0 | |
| | 4 | 1 | 2 | other | 7454 | |
| | 4 | 1 | 3 | other | 11245 | |
| | 4 | 1 | 4 | event end | 24567 | |
| | 4 | 1 | 5 | other | 29562 | |
| | 4 | 1 | 6 | other | 43015 | |
| +--------+-----------+-----------------+-------------+--------------+ |
我想捕捉完整的事件 - 包含事件开始和结束的会话(有些可能有一个开始但没有结束,结束但没有开始,或者两者都没有 - 我不想要那些),以及他们的开始和结束时间。最后,我希望通过将连续的时间行转换为列来跟踪持续时间,以便我可以计算差异。理想情况下,上述数据表将转换为:
+--------+-----------+---------------+--------+--------+
| userid | sessionid | full event id | start | end |
+--------+-----------+---------------+--------+--------+
| 1 | 1 | 1 | 0 | 248641 |
| 1 | 1 | 2 | 488284 | 598123 |
| 1 | 2 | 1 | 0 | 54363 |
| 4 | 1 | 1 | 0 | 24567 |
+--------+-----------+---------------+--------+--------+
我尝试过类似的事情:
select a.userid, a.sessionid, a.milliseconds as start, b.milliseconds as end
from #table a
inner join #table b
on a.userid=b.userid
and a.sessionid=b.sessionid
and a.action='event start'
and b.action='event end'
然而,这并不起作用,因为一些用户可能有多个事件开始和结束会话(如用户标识1)。我坚持如何最好地转换每个事件的时间数据。谢谢你的帮助!
答案 0 :(得分:1)
所以,鉴于您的上述数据:
CREATE TABLE test_table (
`userid` int,
`sessionid` int,
`actionSequence` int,
`action` varchar(11),
`milliseconds` int
);
INSERT INTO test_table
(`userid`, `sessionid`, `actionSequence`, `action`, `milliseconds`)
VALUES
(1, 1, 1, 'event start', 0),
(1, 1, 2, 'other', 188114),
(1, 1, 3, 'event end', 248641),
(1, 1, 4, 'other', 398215),
(1, 1, 5, 'event start', 488284),
(1, 1, 6, 'other', 528445),
(1, 1, 7, 'other', 572711),
(1, 1, 8, 'event end', 598123),
(1, 2, 1, 'event start', 0),
(1, 2, 2, 'event end', 54363),
(2, 1, 1, 'other', 0),
(2, 1, 2, 'other', 2345),
(2, 1, 1, 'other', 75647),
(3, 1, 2, 'other', 0),
(3, 1, 3, 'event start', 34678),
(3, 1, 4, 'other', 46784),
(3, 1, 5, 'other', 78905),
(4, 1, 1, 'event start', 0),
(4, 1, 2, 'other', 7454),
(4, 1, 3, 'other', 11245),
(4, 1, 4, 'event end', 24567),
(4, 1, 5, 'other', 29562),
(4, 1, 6, 'other', 43015);
以下查询可以帮助您(您走在正确的轨道上):
SELECT
tt1.userid,
tt1.sessionid,
tt1.actionSequence,
tt1.milliseconds AS startMS,
MIN(tt2.milliseconds) AS endMS,
MIN(tt2.milliseconds) - tt1.milliseconds AS totalMS
FROM test_table tt1
INNER JOIN test_table tt2
ON tt2.userid = tt1.userid
AND tt2.sessionid = tt1.sessionid
AND tt2.actionSequence > tt1.actionSequence
AND tt2.action = 'event end'
WHERE tt1.action = 'event start'
GROUP BY tt1.userid, tt1.sessionid, tt1.actionSequence, startMS
给你这个结果集:
userid sessionid actionSequence startMS endMS totalMS
1 1 1 0 248641 248641
1 1 5 488284 598123 109839
1 2 1 0 54363 54363
4 1 1 0 24567 24567
GROUP BY
很重要,因为action = 'event end'
和sequence > 1
有sessionid = 1
和userid = 1
两行,所以(我假设)我们想要最接近当前序列的一个,即MIN(milliseconds)
。正如您所看到的,它还允许您继续使用此结果集中两列的差异,从而节省您计划的额外步骤:]
Here is a SQLFiddle。您没有指定RDBMS,但我相信此查询使用的语言应该足够简单,可以在任何sql引擎中使用。