我有一个包含以下信息的用户事件数据框
USER Timestamp day_of_week Busi_days Busi_hours
AAS 2017-07-11 09:31:44 Tuesday True True
AAS 2017-07-11 23:24:43 Tuesday True False
SAP 2017-07-11 11:29:40 Tuesday True True
SAP 2017-07-11 16:58:49 Tuesday True True
YAS 2017-07-11 15:26:57 Tuesday True True
我需要计算的是某些功能,例如,我所知道的USERS是highly active
,点击次数是什么,
df.groupby(['USER']).count()
USER ts Timestamp day_of_week Busi_days Busi_hours
AAS 7 2017-07-11 09:31:44 Tuesday True True
AAS 11 2017-07-11 23:24:43 Tuesday True False
SAP 9 2017-07-11 11:29:40 Tuesday True True
SAP 14 2017-07-11 16:58:49 Tuesday True True
YAS 8 2017-07-11 15:26:57 Tuesday True True
现在我需要帮助或建议如何最好地定义规则和逻辑来计算每个功能highly active
。我尝试排名为使用,
df.groupby(['USER'])['Timestamp'].rank("dense", ascending=False),
抛出值错误:
那么qcut
是另一种选择,
TA_log[['USER', 'Timestamp']].groupby(['USER']).count().apply(lambda x: pd.qcut(x, np.linspace(0,7)))
抛出索引错误,
IndexError: ('index 294117 is out of bounds for axis 0 with size 257354', 'occurred at index ts')
任何帮助都会很棒