我有一张这样的表:
Item Selected Session ID Created
A 1 2017-11-25T02:22:23
B 1 2017-11-25T02:22:24
B 1 2017-11-25T02:22:25
C 1 2017-11-25T02:22:17
D 1 2017-11-25T02:22:27
A 2 2017-11-25T02:22:28
C 2 2017-11-25T02:22:30
D 2 2017-11-25T02:22:06
我想知道,给定项目A,这是在同一会话ID中选择的最有可能的3-5个项目,在所有会话中。
换句话说,在用户选择项目A之后,他们最常选择的是哪些项目?
项目A查询的首选输出类似于:
2nd Item Selected Percent of time selected
B 33%
C 33%
D 33%
这在SQL中是否可行?
编辑:这是当前的解决方案,但它在BigQuery中不起作用。我发布了我的确切代码,更改了table_name:
select `tag_touched`, count(*) / numsessions as ratio
from (select s.`session_id`, `tag_touched`, max(created) as
maxcreated, a.maxcreated_a, ss.numsessions
from [TABLENAME] s join
(select s.`session_id`, max(s.Created) as maxcreated_a
from [TABLENAME] s
where `tag_touched` = 'A'
group by s.`session_id`,
) a
on s.`session_id` = a.`session_id` cross join
(select count(distinct `session_id`) as numsessions
from [TABLENAME]
where `tag_touched` = 'A'
) ss
group by s.`session_id`, s.`tag_touched`, a.maxcreated_a, ss.numsessions
having max(created) > maxcreated_a
) s
group by `tag_touched`;
但我正在回复错误:
Error: Expression '`tag_touched`' is not present in the GROUP BY list
么?
答案 0 :(得分:0)
您可以使用聚合获取在同一会话中的给定项目之后选择的项目列表。我认为这可能足以满足你的目标:
select item, count(*) / numsessions as ratio
from (select s.sessionId, s.item, max(s.created) as maxcreated, a.maxcreated_a, ss.numsessions
from sessions s join
(select sessionId, max(created) as maxcreated_a
from sessions s
where item = 'A'
group by sessionId
) a
on s.sessionId = a.sessionId cross join
(select count(distinct sessionId) as numsessions
from sessions
where item = 'A'
) ss
group by s.sessionId, s.item, a.maxcreated_a, ss.numsessions
having max(created) > maxcreated_a -- appeared after the last "A"
) s
group by item;