Groupby由不同的列

时间:2018-03-15 15:23:12

标签: pandas datetime group-by

我的数据帧如下:

    StationID   DateTime    Channel Count
0   1   2017-10-01 00:00:00 1   1
1   1   2017-10-01 00:00:00 1   201
2   1   2017-10-01 00:00:00 1   8
3   1   2017-10-01 00:00:00 1   2
4   1   2017-10-01 00:00:00 1   0
5   1   2017-10-01 00:00:00 1   0
6   1   2017-10-01 00:00:00 1   0
7   1   2017-10-01 00:00:00 1   0

..........等等 我想按每小时和每个频道和StationID分组值。

输出要求

Station ID DateTime       Channel    Count  
1   2017-10-01 00:00:00    1          232
1   2017-10-01 00:01:00    1          23
2   2017-10-01 00:00:00    1          244...

......等等

2 个答案:

答案 0 :(得分:1)

我认为您需要groupby使用汇总sumdatetime s需要hour添加floor - 它设置为minute s和second0

print (df)
   StationID             DateTime  Channel  Count
0          1  2017-12-01 00:00:00        1      1
1          1  2017-12-01 00:00:00        1    201
2          1  2017-12-01 00:10:00        1      8
3          1  2017-12-01 10:00:00        1      2
4          1  2017-10-01 10:50:00        1      0
5          1  2017-10-01 10:20:00        1      5
6          1  2017-10-01 08:10:00        1      4
7          1  2017-10-01 08:00:00        1      1

df['DateTime'] = pd.to_datetime(df['DateTime'])

df1 = (df.groupby(['StationID', df['DateTime'].dt.floor('H'), 'Channel'])['Count']
        .sum()
        .reset_index() 
        )
print (df1)
   StationID            DateTime  Channel  Count
0          1 2017-10-01 08:00:00        1      5
1          1 2017-10-01 10:00:00        1      5
2          1 2017-12-01 00:00:00        1    210
3          1 2017-12-01 10:00:00        1      2

print (df['DateTime'].dt.floor('H'))
0   2017-12-01 00:00:00
1   2017-12-01 00:00:00
2   2017-12-01 00:00:00
3   2017-12-01 10:00:00
4   2017-10-01 10:00:00
5   2017-10-01 10:00:00
6   2017-10-01 08:00:00
7   2017-10-01 08:00:00
Name: DateTime, dtype: datetime64[ns]

但是如果日期不重要,只需要几个小时就可以使用hour

df2 = (df.groupby(['StationID', df['DateTime'].dt.hour, 'Channel'])['Count']
        .sum()
        .reset_index() 
        )
print (df2)
   StationID  DateTime  Channel  Count
0          1         0        1    210
1          1         8        1      5
2          1        10        1      7

答案 1 :(得分:0)

或者您可以使用Grouper

df.groupby(pd.Grouper(key='DateTime', freq='"H'), 'Channel', 'StationID')['Count'].sum()