我有一个用户流量表,我需要获得与前一天相比新用户的收益/损失。只是想知道是否有更好的方法来做到这一点,而不是下面的解决方案。
架构: -
Table Strcutre: Session_ID, session_day, user_id, product_id
我尝试了什么?
SELECT session_day,
session_count,
user_count - LAG( user_count, 1 ) OVER ( ORDER BY session_day ) AS gain_loss_users
FROM
(
SELECT session_day,
COUNT( session_id ) AS session_count,
COUNT( user_id ) user_count
FROM user_traffic
GROUP BY session_day
) X ;
答案 0 :(得分:1)
我试图解决“新”和“回归”问题。这是我的尝试:
select session_day,
COUNT( distinct user_id ) AS user_cnt,
count(distinct user_id) - lag(count(distinct user_id))
over (order by session_day) gain,
count(newu) AS newu, count(returnu) AS returnu
from (
select session_id,
session_day,
user_id,
CASE WHEN
count(*) over ( partition by user_id ORDER BY session_day,session_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW )
= 1
THEN 1
END
AS newu,
CASE WHEN
lag( session_day,1 ) over ( partition by user_id ORDER BY session_day,session_id )
<>
lag( session_day,1 ) over ( order by session_day,session_id )
THEN 1
END AS returnu
from user_traffic u
)
group by session_day
order by session_day;
测试数据和输出:
create table user_traffic (session_id number(6), session_day date,
user_id number(6), product_id number(6));
insert into user_traffic values ( 1, date '2016-09-07', 101, 1);
insert into user_traffic values ( 2, date '2016-09-07', 101, 4);
insert into user_traffic values ( 3, date '2016-09-07', 102, 1);
insert into user_traffic values ( 4, date '2016-09-08', 101, 2);
insert into user_traffic values ( 5, date '2016-09-08', 101, 4);
insert into user_traffic values ( 6, date '2016-09-09', 102, 1);
insert into user_traffic values ( 7, date '2016-09-10', 102, 1);
insert into user_traffic values ( 8, date '2016-09-10', 103, 3);
SESSION_DAY CNT GAIN NEW RETURNS
----------- ---------- ---------- ---------- ----------
2016-09-07 2 2 0 -- 101 & 102 are new
2016-09-08 1 -1 0 0
2016-09-09 1 0 0 1 -- 102 returned
2016-09-10 2 1 1 0 -- 103 is new
答案 1 :(得分:0)
没有更好的方式,但有一种更简洁的方式。您可以将窗口函数与聚合函数混合使用:
SELECT session_day,
COUNT(session_id ) as session_count,
COUNT(DISTINCT user_id ) as user_count,
(COUNT(DISTINCT user_id ) -
LAG(COUNT(DISTINCT user_id )) OVER (ORDER BY session_day)
) as gain_loss_users
FROM user_traffic
GROUP BY session_day;
我认为您需要COUNT(DISTINCT)
,因为(1)用户可能在同一天有多个会话,(2)两个计数相同(如果user_id
和session_id
绝不是NULL
)。