在MySQL数据库中,我有下表(它是客户的面板数据)
user | tab | action | time|
77 - login 1407171344
77 user-info view 1407171400
77 traffic select 1407171407
77 - login 1407171440
65 - login 1407171505
65 change select 1407564830
65 change pay 1407579352
65 - login 1407579442
65 - login 1407579765
77 - login 1407579866
77 - login 1407680000
77 promotion bank 1407171400
77 promotion pay 1408100946
65 traffic select 1407171400
65 traffic pay 1408114734
65 - login 1408125796
65 service extend 1408192741

我有很多行,客户ID不同。我想计算每个客户的活动会话数。也就是说,我想计算一个客户登录的次数,并在登录后执行另一个操作/操作。因此,两次连续登录之间没有任何操作不会计入会话。下次登录时可以代理会话结束。对于用户77,前三行(操作:登录,选择,查看)包含会话,但下次登录不包括因为没有采取其他操作。因此,在上表中,用户77具有两个活动会话,而用户75具有3个活动会话。
活动会话如下:(重复登录将不会删除任何操作)
user | tab | action | time|
77 - login 1407171344
77 user-info view 1407171400
77 traffic select 1407171407
65 - login 1407171505
65 change select 1407564830
65 change pay 1407579352
65 - login 1407579765
77 - login 1407680000
77 promotion bank 1407171400
77 promotion pay 1408100946
65 traffic select 1407171400
65 traffic pay 1408114734
65 - login 1408125796
65 service extend 1408192741

如何计算活动会话?提前谢谢。
P.S。我已经尝试在R中导入数据,但它是一个大数据,R似乎很慢的循环。所以我尽可能地坚持使用SQL。
答案 0 :(得分:0)
假设用户不能同时拥有多个会话, 如果他们这样做,那么您需要使用第三个参数以不同方式跟踪它们。 假设您的数据已经在表user_action中并且目前已经存在 它看起来像:
SELECT user,action,time FROM user_action order by user, time;
user activity time
65 select 1407171400
65 login 1407171505
65 select 1407564830
65 pay 1407579352
65 login 1407579442
65 login 1407579765
65 pay 1408114734
65 login 1408125796
65 extend 1408192741
77 login 1407171344
77 bank 1407171400
77 view 1407171400
77 select 1407171407
77 login 1407171440
77 login 1407579866
77 login 1407680000
77 pay 1408100946
复制用户订购的记录,然后按时间复制 ,使用新的列活动编号 - >进入新的临时表进行分析
每个客户的最后一个会话可能没有登录记录来表示会话结束,因此我们会在会话结束时为每个客户添加一个登录行。
DROP TABLE IF EXISTS user_action_temp;
SET @activity_number := 0;
CREATE TABLE user_action_temp
AS
SELECT @activity_number := @activity_number + 1 AS activity_number, user, action, time
FROM
(SELECT user,action,time FROM user_action
UNION SELECT user,'login' as action,max(time)+1 as time FROM user_action GROUP BY user) AS USER_ACTIVITY
ORDER BY user, time;
您的数据现在看起来像:
select * From user_action_temp order by user, time;
activity_number user action time
1 65 select 1407171400
2 65 login 1407171505
3 65 select 1407564830
4 65 pay 1407579352
5 65 login 1407579442
6 65 login 1407579765
7 65 pay 1408114734
8 65 login 1408125796
9 65 extend 1408192741
10 65 login 1408192742
11 77 login 1407171344
12 77 bank 1407171400
13 77 view 1407171400
14 77 select 1407171407
15 77 login 1407171440
16 77 login 1407579866
17 77 login 1407680000
18 77 pay 1408100946
19 77 login 1408100947
接下来,自行加入此表 让我们定义两个变量来设置每个登录活动的登录号。
SET @login_number1:=0;
SET @login_number2:=0;
表1中登录号码的表自我加入与表2中的下一次登录匹配,用户保持不变。 活动计数是两次登录之间的总活动
SELECT * FROM
(
SELECT logins_1.user,
logins_1.time as session_start,
logins_2.time as session_end,
case when (logins_2.activity_number -logins_1.activity_number )>1
then (logins_2.activity_number -logins_1.activity_number - 1) else 0 end
as activity_count
FROM
(SELECT @login_number1 := @login_number1 + 1 AS login_number,
activity_number, user, action, time
FROM user_action_temp
WHERE action='login'
ORDER BY user, time) AS logins_1
LEFT OUTER JOIN
(SELECT @login_number2 := @login_number2 + 1 AS login_number2,
activity_number, user, action, time
FROM user_action_temp
WHERE action='login'
ORDER BY user, time) AS logins_2
on logins_1.login_number = (logins_2.login_number2-1)
and logins_1.user = logins_2.user
) AS RESULT;
其中提供了所有用户会话的摘要:
user session_start session_end activity_count
65 1407171505 1407579442 2
65 1407579442 1407579765 0
65 1407579765 1408125796 1
65 1408125796 1408192742 1
65 1408192742 <null> 0
77 1407171344 1407171440 3
77 1407171440 1407579866 0
77 1407579866 1407680000 0
77 1407680000 1408100947 1
77 1408100947 <null> 0
您可以使用WHERE activity_count>0
过滤上述查询,以获得您想要的内容。