我正在尝试使用“小时”列进行分箱,但是它不起作用

时间:2020-08-11 15:08:44

标签: python pandas bucket

我有一个DF,每次都有一系列的积分,我想在一天的每个小时(从00:00:00到24:00:00)将其分组到存储桶中

这是我称为dfH的一部分df:

     Hora de início Rodada
00:00:00     636
00:00:07    1184
00:00:09     680
00:00:23     651
00:00:30     539
00:01:16    1076
00:01:44     925
00:02:00     229
00:02:48     452
00:03:06    1143
00:03:55     401
00:04:10    1148
00:04:20     677
00:04:26     552
00:05:10    1182
00:05:44     677
00:06:03     657
00:06:23    1172
00:06:34     428
00:06:59     662
00:07:05    1131
00:07:30     675
00:07:53    1175
00:08:06    1121
00:08:33     564
00:08:43     673
00:08:45     670
00:09:06    1014
00:09:17     449
00:09:19    1156
Name: (TOTAL ESTRELAS, TOTAL), dtype: int64

我正在尝试:

bins = np.arange(0,24,1)

groups = dfH.groupby(pd.cut(dfH,bins))。sum()

但是我得到:

(TOTAL ESTRELAS, TOTAL)
(0, 1]      0
(1, 2]      0
(2, 3]      0
(3, 4]      0
(4, 5]      0
(5, 6]      0
(6, 7]      0
(7, 8]      0
(8, 9]      0
(9, 10]     0
(10, 11]    0
(11, 12]    0
(12, 13]    0
(13, 14]    0
(14, 15]    0
(15, 16]    0
(16, 17]    0
(17, 18]    0
(18, 19]    0
(19, 20]    0
(20, 21]    0
(21, 22]    0
(22, 23]    0
Name: (TOTAL ESTRELAS, TOTAL), dtype: int64

也许索引格式不是小时格式,所以我尝试了:

dfH.index = pd.to_datetime(dfH.index,format ='%H:%M:%S')。dtype.hour

但是我得到了错误:

ValueError:时间数据“ TOTAL”与格式“%H:%M:%S”(匹配)不匹配

1 个答案:

答案 0 :(得分:0)

尝试做:

dfH.resample("1h").sum()

如果您的索引是日期时间