每周大熊猫分组

时间:2017-12-19 10:14:35

标签: pandas-groupby

我有一个数据框,df包含

 Index         Date & Time        eventName       eventCount
 0            2017-08-09              ABC            24
 1            2017-08-09              CDE           140
 2            2017-08-10              CDE           150
 3            2017-08-11              DEF           200
 4            2017-08-11              ABC            20
 5            2017-08-16              CDE            10
 6            2017-08-16              ABC            15
 7            2017-08-17              CDE            10
 8            2017-08-17              DEF            50
 9            2017-08-18              DEF            80
     ...

我想为每个每周发生的事件总结一次eventCount,并绘制每周每天(从MON到SUN)的总事件的情节,例如:  eventCount值的总和:

2017-08-09 and 2017-08-16(Mondays)=189 
2017-08-10 and 2017-08-17(Tuesdays)=210
2017-08-16 and 2017-08-23(Wednesdays)=300

我试过了

dailyOccurenceSum=df['eventCount'].groupby(lambda x: x.weekday).sum()                                      

我收到此错误:AttributeError:'int'对象没有属性'weekday'

1 个答案:

答案 0 :(得分:1)

df开始 -

df

   Index Date & Time eventName  eventCount
0      0  2017-08-09       ABC          24
1      1  2017-08-09       CDE         140
2      2  2017-08-10       CDE         150
3      3  2017-08-11       DEF         200
4      4  2017-08-11       ABC          20
5      5  2017-08-16       CDE          10
6      6  2017-08-16       ABC          15
7      7  2017-08-17       CDE          10
8      8  2017-08-17       DEF          50
9      9  2017-08-18       DEF          80

首先,将Date & Time转换为datetime列 -

df['Date & Time'] = pd.to_datetime(df['Date & Time'])

接下来,在工作日名称上调用groupby + sum

df = df.groupby(df['Date & Time'].dt.weekday_name)['eventCount'].sum()
df

Date & Time
Friday       300
Thursday     210
Wednesday    189
Name: eventCount, dtype: int64

如果您想按工作日排序,请将索引转换为分类并调用sort_index -

cat = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday', 'Sunday']

df.index = pd.Categorical(df.index, categories=cat, ordered=True)
df = df.sort_index()
df

Wednesday    189
Thursday     210
Friday       300
Name: eventCount, dtype: int64