我是BQ的新手,并且不确定通过此查询会花多少钱。
我有一个记录所有用户访问时间的表,如下所示:
user_id access_time
-------------------------------------
user_a 2015-06-15 14:12:12
user_b 2015-06-15 14:12:12
user_a 2015-06-15 14:12:13
user_a 2015-06-15 14:12:19
user_a 2015-06-15 14:12:28
user_a 2015-06-15 19:32:15
user_a 2015-06-15 19:32:19
我想生成一个活动的会话表来表示用户的所有活动窗口。每个会话包含持续时间和开始时间。
如果下次访问不在10秒内,会话将过期。
会话表的例子是:
session_id user_id session_start_time duration
------------------------------------------------------------
1 user_a 2015-06-15 14:12:12 16
2 user_b 2015-06-15 14:12:12 0
3 user_a 2015-06-15 19:32:15 4
BQ似乎不支持自定义功能,如何通过单一查询实现这一目标?
提前致谢!
更新:
修正了这个例子。
答案 0 :(得分:4)
为了说明使用示例中的数据的方法,以下是查询将如何显示具有开始时间的新会话:
select user, ts start_time from (
select user, ifnull(seconds - prev_seconds > 10, true) new_session from (
select user, ts, seconds, lag(seconds, 1) over(partition by user order by seconds) prev_seconds from
(select user, ts, integer(ts/1000000) seconds from
(select 'user_a' user, timestamp('2015-06-15 14:12:12') ts),
(select 'user_b' user, timestamp('2015-06-15 14:12:12') ts),
(select 'user_a' user, timestamp('2015-06-15 14:12:13') ts),
(select 'user_a' user, timestamp('2015-06-15 14:12:19') ts),
(select 'user_a' user, timestamp('2015-06-15 14:12:28') ts),
(select 'user_a' user, timestamp('2015-06-15 19:32:15') ts),
(select 'user_a' user, timestamp('2015-06-15 19:32:19') ts))))
where new_session
为了获得会话的持续时间,我们可以运行另一个窗口函数,而不是进行自连接。基本上我们首先找到会话的开始和结束,然后计算它们之间的差异:
select user, ts, if(next_is_last, next_seconds - seconds, 0) duration
from (
select
user, new_session, last_session, ts, seconds,
lead(seconds, 1) over(partition by user order by seconds) next_seconds,
lead(last_session, 1) over(partition by user order by seconds) next_is_last
from (
select
user,
ts,
ifnull(seconds - prev_seconds > 10, true) new_session,
ifnull(next_seconds - seconds > 10, true) last_session
from (
select
user,
ts,
seconds,
lag(seconds, 1) over(partition by user order by seconds) prev_seconds,
lead(seconds, 1) over(partition by user order by seconds) next_seconds
from
(select user, ts, integer(ts/1000000) seconds from
(select 'user_a' user, timestamp('2015-06-15 14:12:12') ts),
(select 'user_b' user, timestamp('2015-06-15 14:12:12') ts),
(select 'user_a' user, timestamp('2015-06-15 14:12:13') ts),
(select 'user_a' user, timestamp('2015-06-15 14:12:19') ts),
(select 'user_a' user, timestamp('2015-06-15 14:12:28') ts),
(select 'user_a' user, timestamp('2015-06-15 19:32:15') ts),
(select 'user_a' user, timestamp('2015-06-15 19:32:19') ts))))
where new_session or last_session)
where new_session
这导致:
Row user ts duration
1 user_a 2015-06-15 14:12:12 UTC 16
2 user_a 2015-06-15 19:32:15 UTC 4
3 user_b 2015-06-15 14:12:12 UTC 0
答案 1 :(得分:1)
如果不能访问数据集本身,我会有点难以回答,但这是我要实现的逻辑流程:
加入两个子表格,如:
on a.user_id = b.user_id和b.access_time> = a.session_start_time和b.access_time< next_session_time
然后只为每个用户和会话求和
可能不是最有效的方法(将部分结果保存到临时表以避免两次运行所有数据),但它应该可以工作
答案 2 :(得分:0)
好的,Mosha's answer开悟了,我尝试了这个解决方案。 关键点是:
这是脚本:
select user,
case
when not new_session and end_of_session then seconds - start_time
when end_of_session and end_of_session then 0
end as duration,
case
when not new_session and end_of_session then start_time
when new_session and end_of_session then seconds
end as session_start,
seconds as session_end from
(select *, lag(seconds, 1) over (partition by user order by seconds, prev_seconds) as start_time from
(select user, seconds , new_session, ifnull(end_session_temp, true) end_of_session, prev_seconds from
(select user, seconds , new_session, prev_seconds, lead(new_session, 1) over (partition by user order by seconds, prev_seconds) as end_session_temp from
(select user, seconds, new_session, prev_seconds from
(select user, seconds, prev_seconds, ifnull(seconds - prev_seconds > 10, true) new_session from
(select user, ts, seconds, lag(seconds, 1) over(partition by user order by seconds) as prev_seconds from
(select user, ts, integer(ts/1000000) seconds from
(select 'user_a' user, timestamp('2015-06-15 14:12:12') ts),
(select 'user_b' user, timestamp('2015-06-15 14:12:12') ts),
(select 'user_a' user, timestamp('2015-06-15 14:12:13') ts),
(select 'user_a' user, timestamp('2015-06-15 14:12:19') ts),
(select 'user_a' user, timestamp('2015-06-15 14:12:28') ts),
(select 'user_a' user, timestamp('2015-06-15 19:32:15') ts),
(select 'user_a' user, timestamp('2015-06-15 19:32:19') ts))))))
where (new_session or end_session_temp is null or end_session_temp)))
where not (new_session and not end_of_session)
输出结果为:
Row user duration session_start session_end
1 user_b 0 1434377532 1434377532
2 user_a 16 1434377532 1434377548
3 user_a 4 1434396735 1434396739