根据事件

时间:2018-01-24 15:49:00

标签: python pandas numpy group-by data-science

我有一个包含以下信息的用户事件数据框

USER    Timestamp   day_of_week Busi_days   Busi_hours
AAS 2017-07-11 09:31:44 Tuesday True    True
AAS 2017-07-11 23:24:43 Tuesday True    False
SAP 2017-07-11 11:29:40 Tuesday True    True
SAP 2017-07-11 16:58:49 Tuesday True    True
YAS 2017-07-11 15:26:57 Tuesday True    True

我需要计算的是某些功能,例如,我所知道的USERS是highly active,点击次数是什么,

df.groupby(['USER']).count()

USER ts     Timestamp       day_of_week Busi_days   Busi_hours
AAS 7   2017-07-11 09:31:44 Tuesday True    True
AAS 11  2017-07-11 23:24:43 Tuesday True    False
SAP 9   2017-07-11 11:29:40 Tuesday True    True
SAP 14  2017-07-11 16:58:49 Tuesday True    True
YAS 8   2017-07-11 15:26:57 Tuesday True    True

现在我需要帮助或建议如何最好地定义规则和逻辑来计算每个功能highly active。我尝试排名为使用,

df.groupby(['USER'])['Timestamp'].rank("dense", ascending=False),

抛出值错误: 那么qcut是另一种选择,

TA_log[['USER', 'Timestamp']].groupby(['USER']).count().apply(lambda x: pd.qcut(x, np.linspace(0,7)))

抛出索引错误,

IndexError: ('index 294117 is out of bounds for axis 0 with size 257354', 'occurred at index ts')

任何帮助都会很棒

0 个答案:

没有答案