如何使用pandas在两个值之间选择行范围?

时间:2016-11-10 21:28:09

标签: python pandas dataframe group-by

我想在底部切片样本数据......

这样每个会话(会话 - 从登录到下次登录之前的操作的事件)看起来像这样:

login,4,2016-11-10 05:28:30.396,hbhimani,11/10/2016
getUserPreferences,179,2016-11-10 05:28:30.575,hbhimani,11/10/2016
getPreference,3,2016-11-10 05:28:55.686,hbhimani,11/10/2016
getPreference,4,2016-11-10 05:28:55.961,hbhimani,11/10/2016
constructFromSession,4,2016-11-10 05:28:56.108,hbhimani,11/10/2016
getUserPreferences,4,2016-11-10 05:28:56.112,hbhimani,11/10/2016
getUserPreferences,3,2016-11-10 05:28:56.116,hbhimani,11/10/2016
setBooleanPreference,4,2016-11-10 05:28:56.238,hbhimani,11/10/2016
setBooleanPreference,4,2016-11-10 05:28:56.513,hbhimani,11/10/2016
getQuickSearchInitInfo,3,2016-11-10 05:28:58.936,hbhimani,11/10/2016
getQuickSearchInitInfo2,4,2016-11-10 05:28:59.315,hbhimani,11/10/2016

我想计算一些记录和getPreference动作的发生。将显示为一个看起来像这样的记录:

day,User,session_duration(min),getPreference_count,total_session_actions
11/10/2016,hbhimani, 180, 2, 11

当我有多个会话时,我的挑战正在发生。我不知道如何动态切片索引。

示例数据:

Action,Duration,_time,User,day
login,4,2016-11-10 05:28:30.396,hbhimani,11/10/2016
getUserPreferences,179,2016-11-10 05:28:30.575,hbhimani,11/10/2016
getPreference,3,2016-11-10 05:28:55.686,hbhimani,11/10/2016
getPreference,4,2016-11-10 05:28:55.961,hbhimani,11/10/2016
constructFromSession,4,2016-11-10 05:28:56.108,hbhimani,11/10/2016
getUserPreferences,4,2016-11-10 05:28:56.112,hbhimani,11/10/2016
getUserPreferences,3,2016-11-10 05:28:56.116,hbhimani,11/10/2016
setBooleanPreference,4,2016-11-10 05:28:56.238,hbhimani,11/10/2016
setBooleanPreference,4,2016-11-10 05:28:56.513,hbhimani,11/10/2016
getQuickSearchInitInfo,3,2016-11-10 05:28:58.936,hbhimani,11/10/2016
getQuickSearchInitInfo2,4,2016-11-10 05:28:59.315,hbhimani,11/10/2016
login,3,2016-11-10 05:29:29.202,hbhimani,11/10/2016
getSummary,4042,2016-11-10 05:29:33.246,hbhimani,11/10/2016
getEnclosures,457,2016-11-10 05:29:34.372,hbhimani,11/10/2016
getAuditTrail,1061,2016-11-10 05:29:36.034,hbhimani,11/10/2016
getRelatedDefects,5,2016-11-10 05:29:36.586,hbhimani,11/10/2016
getServiceRequests,5,2016-11-10 05:29:36.864,hbhimani,11/10/2016
getForeignBugs,270,2016-11-10 05:29:37.408,hbhimani,11/10/2016
getEnclosures,455,2016-11-10 05:29:50.087,hbhimani,11/10/2016
getSummary,5505,2016-11-10 05:32:26.584,hbhimani,11/10/2016
getEnclosures,459,2016-11-10 05:32:27.940,hbhimani,11/10/2016
login,997,2016-11-10 05:32:29.480,anshanno,11/10/2016
getRelatedDefects,5,2016-11-10 05:32:30.027,anshanno,11/10/2016
getServiceRequests,5,2016-11-10 05:32:30.306,anshanno,11/10/2016
getForeignBugs,6,2016-11-10 05:32:30.585,anshanno,11/10/2016

1 个答案:

答案 0 :(得分:1)

IIUC您可以按如下方式对数据进行分组:

原创DF:

In [62]: df
Out[62]:
                     Action  Duration                   _time      User        day
0                     login         4 2016-11-10 05:28:30.396  hbhimani 2016-11-10
1        getUserPreferences       179 2016-11-10 05:28:30.575  hbhimani 2016-11-10
2             getPreference         3 2016-11-10 05:28:55.686  hbhimani 2016-11-10
3             getPreference         4 2016-11-10 05:28:55.961  hbhimani 2016-11-10
4      constructFromSession         4 2016-11-10 05:28:56.108  hbhimani 2016-11-10
5        getUserPreferences         4 2016-11-10 05:28:56.112  hbhimani 2016-11-10
6        getUserPreferences         3 2016-11-10 05:28:56.116  hbhimani 2016-11-10
7      setBooleanPreference         4 2016-11-10 05:28:56.238  hbhimani 2016-11-10
8      setBooleanPreference         4 2016-11-10 05:28:56.513  hbhimani 2016-11-10
9    getQuickSearchInitInfo         3 2016-11-10 05:28:58.936  hbhimani 2016-11-10
10  getQuickSearchInitInfo2         4 2016-11-10 05:28:59.315  hbhimani 2016-11-10
11                    login         3 2016-11-10 05:29:29.202  hbhimani 2016-11-10
12               getSummary      4042 2016-11-10 05:29:33.246  hbhimani 2016-11-10
13            getEnclosures       457 2016-11-10 05:29:34.372  hbhimani 2016-11-10
14            getAuditTrail      1061 2016-11-10 05:29:36.034  hbhimani 2016-11-10
15        getRelatedDefects         5 2016-11-10 05:29:36.586  hbhimani 2016-11-10
16       getServiceRequests         5 2016-11-10 05:29:36.864  hbhimani 2016-11-10
17           getForeignBugs       270 2016-11-10 05:29:37.408  hbhimani 2016-11-10
18            getEnclosures       455 2016-11-10 05:29:50.087  hbhimani 2016-11-10
19               getSummary      5505 2016-11-10 05:32:26.584  hbhimani 2016-11-10
20            getEnclosures       459 2016-11-10 05:32:27.940  hbhimani 2016-11-10
21                    login       997 2016-11-10 05:32:29.480  anshanno 2016-11-10
22        getRelatedDefects         5 2016-11-10 05:32:30.027  anshanno 2016-11-10
23       getServiceRequests         5 2016-11-10 05:32:30.306  anshanno 2016-11-10
24           getForeignBugs         6 2016-11-10 05:32:30.585  anshanno 2016-11-10

将它分组:

In [63]: grp = df.groupby(['User', df.Action.eq('login').cumsum()])

打印所有群组:

In [64]: for g, x in grp:
    ...:     print(x)
    ...:
                Action  Duration                   _time      User        day
21               login       997 2016-11-10 05:32:29.480  anshanno 2016-11-10
22   getRelatedDefects         5 2016-11-10 05:32:30.027  anshanno 2016-11-10
23  getServiceRequests         5 2016-11-10 05:32:30.306  anshanno 2016-11-10
24      getForeignBugs         6 2016-11-10 05:32:30.585  anshanno 2016-11-10
                     Action  Duration                   _time      User        day
0                     login         4 2016-11-10 05:28:30.396  hbhimani 2016-11-10
1        getUserPreferences       179 2016-11-10 05:28:30.575  hbhimani 2016-11-10
2             getPreference         3 2016-11-10 05:28:55.686  hbhimani 2016-11-10
3             getPreference         4 2016-11-10 05:28:55.961  hbhimani 2016-11-10
4      constructFromSession         4 2016-11-10 05:28:56.108  hbhimani 2016-11-10
5        getUserPreferences         4 2016-11-10 05:28:56.112  hbhimani 2016-11-10
6        getUserPreferences         3 2016-11-10 05:28:56.116  hbhimani 2016-11-10
7      setBooleanPreference         4 2016-11-10 05:28:56.238  hbhimani 2016-11-10
8      setBooleanPreference         4 2016-11-10 05:28:56.513  hbhimani 2016-11-10
9    getQuickSearchInitInfo         3 2016-11-10 05:28:58.936  hbhimani 2016-11-10
10  getQuickSearchInitInfo2         4 2016-11-10 05:28:59.315  hbhimani 2016-11-10
                Action  Duration                   _time      User        day
11               login         3 2016-11-10 05:29:29.202  hbhimani 2016-11-10
12          getSummary      4042 2016-11-10 05:29:33.246  hbhimani 2016-11-10
13       getEnclosures       457 2016-11-10 05:29:34.372  hbhimani 2016-11-10
14       getAuditTrail      1061 2016-11-10 05:29:36.034  hbhimani 2016-11-10
15   getRelatedDefects         5 2016-11-10 05:29:36.586  hbhimani 2016-11-10
16  getServiceRequests         5 2016-11-10 05:29:36.864  hbhimani 2016-11-10
17      getForeignBugs       270 2016-11-10 05:29:37.408  hbhimani 2016-11-10
18       getEnclosures       455 2016-11-10 05:29:50.087  hbhimani 2016-11-10
19          getSummary      5505 2016-11-10 05:32:26.584  hbhimani 2016-11-10
20       getEnclosures       459 2016-11-10 05:32:27.940  hbhimani 2016-11-10

说明:

In [71]: df['grp_id'] = df.Action.eq('login').cumsum()

In [72]: df[['Action','User','grp_id']]
Out[72]:
                     Action      User  grp_id
0                     login  hbhimani       1
1        getUserPreferences  hbhimani       1
2             getPreference  hbhimani       1
3             getPreference  hbhimani       1
4      constructFromSession  hbhimani       1
5        getUserPreferences  hbhimani       1
6        getUserPreferences  hbhimani       1
7      setBooleanPreference  hbhimani       1
8      setBooleanPreference  hbhimani       1
9    getQuickSearchInitInfo  hbhimani       1
10  getQuickSearchInitInfo2  hbhimani       1
11                    login  hbhimani       2
12               getSummary  hbhimani       2
13            getEnclosures  hbhimani       2
14            getAuditTrail  hbhimani       2
15        getRelatedDefects  hbhimani       2
16       getServiceRequests  hbhimani       2
17           getForeignBugs  hbhimani       2
18            getEnclosures  hbhimani       2
19               getSummary  hbhimani       2
20            getEnclosures  hbhimani       2
21                    login  anshanno       3
22        getRelatedDefects  anshanno       3
23       getServiceRequests  anshanno       3
24           getForeignBugs  anshanno       3