最常见的第二个查询,给定第一个 - SQL分组

时间:2018-03-05 16:10:57

标签: mysql sql aggregate

我有一张这样的表:

Item Selected    Session ID      Created
    A             1         2017-11-25T02:22:23
    B             1         2017-11-25T02:22:24
    B             1         2017-11-25T02:22:25
    C             1         2017-11-25T02:22:17
    D             1         2017-11-25T02:22:27
    A             2         2017-11-25T02:22:28
    C             2         2017-11-25T02:22:30
    D             2         2017-11-25T02:22:06

我想知道,给定项目A,这是在同一会话ID中选择的最有可能的3-5个项目,在所有会话中。

换句话说,在用户选择项目A之后,他们最常选择的是哪些项目?

项目A查询的首选输出类似于:

2nd Item Selected       Percent of time selected
     B                      33%
     C                      33%
     D                      33%

这在SQL中是否可行?

编辑:这是当前的解决方案,但它在BigQuery中不起作用。我发布了我的确切代码,更改了table_name:

select `tag_touched`, count(*) / numsessions as ratio
from (select s.`session_id`, `tag_touched`, max(created) as 
maxcreated, a.maxcreated_a, ss.numsessions
      from [TABLENAME] s join
           (select s.`session_id`, max(s.Created) as maxcreated_a
            from [TABLENAME] s
            where `tag_touched` = 'A'
            group by s.`session_id`,
       ) a
       on s.`session_id` = a.`session_id` cross join
       (select count(distinct `session_id`) as numsessions
        from [TABLENAME]
        where `tag_touched` = 'A'
       ) ss
  group by s.`session_id`, s.`tag_touched`, a.maxcreated_a, ss.numsessions
  having max(created) > maxcreated_a
 ) s
group by `tag_touched`;

但我正在回复错误:

Error: Expression '`tag_touched`' is not present in the GROUP BY list

么?

1 个答案:

答案 0 :(得分:0)

您可以使用聚合获取在同一会话中的给定项目之后选择的项目列表。我认为这可能足以满足你的目标:

select item, count(*) / numsessions as ratio
from (select s.sessionId, s.item, max(s.created) as maxcreated, a.maxcreated_a, ss.numsessions
      from sessions s join
           (select sessionId, max(created) as maxcreated_a
            from sessions s
            where item = 'A'
            group by sessionId
           ) a
           on s.sessionId = a.sessionId cross join
           (select count(distinct sessionId) as numsessions
            from sessions
            where item = 'A'
           ) ss
      group by s.sessionId, s.item, a.maxcreated_a, ss.numsessions
      having max(created) > maxcreated_a -- appeared after the last "A"
     ) s
group by item;