查询分区和计数

时间:2017-11-04 14:02:48

标签: sql postgresql window-functions

给出下表(它用会话记录用户&项目查看历史记录)

 create table view_log (
   server_time timestamp,
   device char(2),

   session_id char(10),
   uid char(7),
   item_id char(7)
 );

我试图了解以下代码的作用..

create table coo_cs as
select
  item_id,
  session_id,
  count(distinct session_id) / (sum(count(distinct session_id)) over (partition by item_id)) cs
from view_log
group by item_id, session_id;

我已尝试使用partition分解行以了解它正在做什么,但随后它会发出DISTINCT is not implemented for window functions

我理解基本的partitiongroup by,但不能理解上面的sql ..

  • 修改

有一个相当大的测试数据......

http://pakdd2017.recobell.io/site_view_log_small.csv000.gz

1 个答案:

答案 0 :(得分:0)

某些数据库(尚未)支持count(distinct)作为窗口功能。对于此查询,count(distinct)不是必需的,因为您通过用于count(distinct)的相同列进行聚合。因此,每行count(distinct session_id)为1。

您的查询基本上是:

select item_id, session_id,
       1.0 / count(session_id) over (partition by item_id)) as cs
from view_log
group by item_id, session_id;

如果您想要item_id级别的比率,我不会感到惊讶,因此预期的查询是:

select item_id, count(distinct session_id),
       count(distinct session_id) * 1.0 / sum(count(distinct session_id)) over ()) as cs
from view_log
group by item_id;

如果是这样,等效逻辑可以使用子查询:

select vl.*, sum(numsession) over () as cs
from (select item_id, count(distinct session_id) as numsessions
      from view_log vl
      group by item_id
     ) vl;