我对Stack和pandas很新,找不到像这样的东西,看起来比典型的操作更复杂,我很想学习如何做到这一点:
The data set:
Day Messages Time
Friday spam 8:05 AM
Tuesday eggs 8:45 AM
Friday smapeggs 9:03 AM
Monday eggseggs 1:05 PM
Tuesday baz 8:33 AM
Monday eggsspam 2:25 PM
期望输出1:
Time ranges Number of Messages
8:00-9:00 AM 3
9:00-10:00 AM 1
1:00-2:30 PM 2
期望输出2:
Day Most Active Time
Friday 8:00-10:00 AM
Tuesday 8:00-9:00 AM
Monday 1:00-2:30 AM
我们的想法是查看一般情况下哪些小时响应最快,哪些小时数对特定日期响应最快。提前谢谢!
答案 0 :(得分:0)
尝试以下方法:
df_start=pd.DataFrame({ 'Day' :['Friday','Tuesday','Friday','Friday'],
'Message':['spam','msg','eggs','another'],
'Time':['8:05 AM','9:45 AM', '10:34 AM', '8:45 AM']})
df_start=df_start.set_index('Time')
df_start.index = pd.to_datetime(df_start.index)
df_result=df_start.resample("2h").count().loc[:,'Message']
对于第二个输出,请尝试使用以下内容:
df_result=df_start.groupby("Day").apply(lambda df_start:
df_start.resample("2h").count())