我想在底部切片样本数据......
这样每个会话(会话 - 从登录到下次登录之前的操作的事件)看起来像这样:
login,4,2016-11-10 05:28:30.396,hbhimani,11/10/2016
getUserPreferences,179,2016-11-10 05:28:30.575,hbhimani,11/10/2016
getPreference,3,2016-11-10 05:28:55.686,hbhimani,11/10/2016
getPreference,4,2016-11-10 05:28:55.961,hbhimani,11/10/2016
constructFromSession,4,2016-11-10 05:28:56.108,hbhimani,11/10/2016
getUserPreferences,4,2016-11-10 05:28:56.112,hbhimani,11/10/2016
getUserPreferences,3,2016-11-10 05:28:56.116,hbhimani,11/10/2016
setBooleanPreference,4,2016-11-10 05:28:56.238,hbhimani,11/10/2016
setBooleanPreference,4,2016-11-10 05:28:56.513,hbhimani,11/10/2016
getQuickSearchInitInfo,3,2016-11-10 05:28:58.936,hbhimani,11/10/2016
getQuickSearchInitInfo2,4,2016-11-10 05:28:59.315,hbhimani,11/10/2016
我想计算一些记录和getPreference动作的发生。将显示为一个看起来像这样的记录:
day,User,session_duration(min),getPreference_count,total_session_actions
11/10/2016,hbhimani, 180, 2, 11
当我有多个会话时,我的挑战正在发生。我不知道如何动态切片索引。
示例数据:
Action,Duration,_time,User,day
login,4,2016-11-10 05:28:30.396,hbhimani,11/10/2016
getUserPreferences,179,2016-11-10 05:28:30.575,hbhimani,11/10/2016
getPreference,3,2016-11-10 05:28:55.686,hbhimani,11/10/2016
getPreference,4,2016-11-10 05:28:55.961,hbhimani,11/10/2016
constructFromSession,4,2016-11-10 05:28:56.108,hbhimani,11/10/2016
getUserPreferences,4,2016-11-10 05:28:56.112,hbhimani,11/10/2016
getUserPreferences,3,2016-11-10 05:28:56.116,hbhimani,11/10/2016
setBooleanPreference,4,2016-11-10 05:28:56.238,hbhimani,11/10/2016
setBooleanPreference,4,2016-11-10 05:28:56.513,hbhimani,11/10/2016
getQuickSearchInitInfo,3,2016-11-10 05:28:58.936,hbhimani,11/10/2016
getQuickSearchInitInfo2,4,2016-11-10 05:28:59.315,hbhimani,11/10/2016
login,3,2016-11-10 05:29:29.202,hbhimani,11/10/2016
getSummary,4042,2016-11-10 05:29:33.246,hbhimani,11/10/2016
getEnclosures,457,2016-11-10 05:29:34.372,hbhimani,11/10/2016
getAuditTrail,1061,2016-11-10 05:29:36.034,hbhimani,11/10/2016
getRelatedDefects,5,2016-11-10 05:29:36.586,hbhimani,11/10/2016
getServiceRequests,5,2016-11-10 05:29:36.864,hbhimani,11/10/2016
getForeignBugs,270,2016-11-10 05:29:37.408,hbhimani,11/10/2016
getEnclosures,455,2016-11-10 05:29:50.087,hbhimani,11/10/2016
getSummary,5505,2016-11-10 05:32:26.584,hbhimani,11/10/2016
getEnclosures,459,2016-11-10 05:32:27.940,hbhimani,11/10/2016
login,997,2016-11-10 05:32:29.480,anshanno,11/10/2016
getRelatedDefects,5,2016-11-10 05:32:30.027,anshanno,11/10/2016
getServiceRequests,5,2016-11-10 05:32:30.306,anshanno,11/10/2016
getForeignBugs,6,2016-11-10 05:32:30.585,anshanno,11/10/2016
答案 0 :(得分:1)
IIUC您可以按如下方式对数据进行分组:
原创DF:
In [62]: df
Out[62]:
Action Duration _time User day
0 login 4 2016-11-10 05:28:30.396 hbhimani 2016-11-10
1 getUserPreferences 179 2016-11-10 05:28:30.575 hbhimani 2016-11-10
2 getPreference 3 2016-11-10 05:28:55.686 hbhimani 2016-11-10
3 getPreference 4 2016-11-10 05:28:55.961 hbhimani 2016-11-10
4 constructFromSession 4 2016-11-10 05:28:56.108 hbhimani 2016-11-10
5 getUserPreferences 4 2016-11-10 05:28:56.112 hbhimani 2016-11-10
6 getUserPreferences 3 2016-11-10 05:28:56.116 hbhimani 2016-11-10
7 setBooleanPreference 4 2016-11-10 05:28:56.238 hbhimani 2016-11-10
8 setBooleanPreference 4 2016-11-10 05:28:56.513 hbhimani 2016-11-10
9 getQuickSearchInitInfo 3 2016-11-10 05:28:58.936 hbhimani 2016-11-10
10 getQuickSearchInitInfo2 4 2016-11-10 05:28:59.315 hbhimani 2016-11-10
11 login 3 2016-11-10 05:29:29.202 hbhimani 2016-11-10
12 getSummary 4042 2016-11-10 05:29:33.246 hbhimani 2016-11-10
13 getEnclosures 457 2016-11-10 05:29:34.372 hbhimani 2016-11-10
14 getAuditTrail 1061 2016-11-10 05:29:36.034 hbhimani 2016-11-10
15 getRelatedDefects 5 2016-11-10 05:29:36.586 hbhimani 2016-11-10
16 getServiceRequests 5 2016-11-10 05:29:36.864 hbhimani 2016-11-10
17 getForeignBugs 270 2016-11-10 05:29:37.408 hbhimani 2016-11-10
18 getEnclosures 455 2016-11-10 05:29:50.087 hbhimani 2016-11-10
19 getSummary 5505 2016-11-10 05:32:26.584 hbhimani 2016-11-10
20 getEnclosures 459 2016-11-10 05:32:27.940 hbhimani 2016-11-10
21 login 997 2016-11-10 05:32:29.480 anshanno 2016-11-10
22 getRelatedDefects 5 2016-11-10 05:32:30.027 anshanno 2016-11-10
23 getServiceRequests 5 2016-11-10 05:32:30.306 anshanno 2016-11-10
24 getForeignBugs 6 2016-11-10 05:32:30.585 anshanno 2016-11-10
将它分组:
In [63]: grp = df.groupby(['User', df.Action.eq('login').cumsum()])
打印所有群组:
In [64]: for g, x in grp:
...: print(x)
...:
Action Duration _time User day
21 login 997 2016-11-10 05:32:29.480 anshanno 2016-11-10
22 getRelatedDefects 5 2016-11-10 05:32:30.027 anshanno 2016-11-10
23 getServiceRequests 5 2016-11-10 05:32:30.306 anshanno 2016-11-10
24 getForeignBugs 6 2016-11-10 05:32:30.585 anshanno 2016-11-10
Action Duration _time User day
0 login 4 2016-11-10 05:28:30.396 hbhimani 2016-11-10
1 getUserPreferences 179 2016-11-10 05:28:30.575 hbhimani 2016-11-10
2 getPreference 3 2016-11-10 05:28:55.686 hbhimani 2016-11-10
3 getPreference 4 2016-11-10 05:28:55.961 hbhimani 2016-11-10
4 constructFromSession 4 2016-11-10 05:28:56.108 hbhimani 2016-11-10
5 getUserPreferences 4 2016-11-10 05:28:56.112 hbhimani 2016-11-10
6 getUserPreferences 3 2016-11-10 05:28:56.116 hbhimani 2016-11-10
7 setBooleanPreference 4 2016-11-10 05:28:56.238 hbhimani 2016-11-10
8 setBooleanPreference 4 2016-11-10 05:28:56.513 hbhimani 2016-11-10
9 getQuickSearchInitInfo 3 2016-11-10 05:28:58.936 hbhimani 2016-11-10
10 getQuickSearchInitInfo2 4 2016-11-10 05:28:59.315 hbhimani 2016-11-10
Action Duration _time User day
11 login 3 2016-11-10 05:29:29.202 hbhimani 2016-11-10
12 getSummary 4042 2016-11-10 05:29:33.246 hbhimani 2016-11-10
13 getEnclosures 457 2016-11-10 05:29:34.372 hbhimani 2016-11-10
14 getAuditTrail 1061 2016-11-10 05:29:36.034 hbhimani 2016-11-10
15 getRelatedDefects 5 2016-11-10 05:29:36.586 hbhimani 2016-11-10
16 getServiceRequests 5 2016-11-10 05:29:36.864 hbhimani 2016-11-10
17 getForeignBugs 270 2016-11-10 05:29:37.408 hbhimani 2016-11-10
18 getEnclosures 455 2016-11-10 05:29:50.087 hbhimani 2016-11-10
19 getSummary 5505 2016-11-10 05:32:26.584 hbhimani 2016-11-10
20 getEnclosures 459 2016-11-10 05:32:27.940 hbhimani 2016-11-10
说明:
In [71]: df['grp_id'] = df.Action.eq('login').cumsum()
In [72]: df[['Action','User','grp_id']]
Out[72]:
Action User grp_id
0 login hbhimani 1
1 getUserPreferences hbhimani 1
2 getPreference hbhimani 1
3 getPreference hbhimani 1
4 constructFromSession hbhimani 1
5 getUserPreferences hbhimani 1
6 getUserPreferences hbhimani 1
7 setBooleanPreference hbhimani 1
8 setBooleanPreference hbhimani 1
9 getQuickSearchInitInfo hbhimani 1
10 getQuickSearchInitInfo2 hbhimani 1
11 login hbhimani 2
12 getSummary hbhimani 2
13 getEnclosures hbhimani 2
14 getAuditTrail hbhimani 2
15 getRelatedDefects hbhimani 2
16 getServiceRequests hbhimani 2
17 getForeignBugs hbhimani 2
18 getEnclosures hbhimani 2
19 getSummary hbhimani 2
20 getEnclosures hbhimani 2
21 login anshanno 3
22 getRelatedDefects anshanno 3
23 getServiceRequests anshanno 3
24 getForeignBugs anshanno 3