我有一个pandas
数据框(存储在.csv
文件中),格式如下。
val,date,time
0.001,01JAN90,0:00:00
0.002,01JAN90,0:01:00
0.005,01JAN90,0:02:00
0.056,01JAN90,0:03:00
...
0.067,31DEC90,23:55:00
0.007,31DEC90,23:56:00
0.006,31DEC90,23:57:00
0.004,31DEC90,23:58:00
0.003,31DEC90,23:59:00
这是:一年中每天(val
列)的每分钟(time
列)的单个浮点(time
列)。我需要在整个一年中对val
元素进行分组,这些元素属于给定的小时范围。我将15小时范围定义为:
t_range = [['5:30:00', '6:30:00'], ['6:30:00', '7:30:00'], ...,
['19:30:00', '20:30:00']]
这里给出的答案Pandas Groupby Range of Values处理定义为浮点数的范围,但我的范围被定义为字符串。
我的想法是,我需要先将time
中的所有HH:MM:SS值转换为浮点数,然后根据groupby和{{3}应用解决方案}。这是正确的方法吗?如果没有,我应该如何使用pandas
呢?
答案 0 :(得分:2)
IIUC你可以这样做:
start = 5*60+30
end = 20*60+30
step = 60
df['ts'] = pd.to_datetime(df.date + ' ' + df.time, format='%d%b%y %H:%M:%S')
df['mins'] = df.ts.dt.hour*60 + df.ts.dt.minute
# filter out all "non-interesting" entries
x = df.query("@start <= mins <= @end")
bins = np.arange(start-step, end+step, step)
labels = ['({0[0]:02d}:{0[1]:02d}:00, {0[0]:02d}:{0[1]:02d}:00]'.format(divmod(x,60),
divmod(x+step,60))
for x in bins[:-1]]
x.groupby(pd.cut(x['mins'], bins=bins, labels=labels))['val'].sum().dropna()
结果:
In [164]: x.groupby(pd.cut(x['mins'], bins=bins, labels=labels))['val'].sum().dropna()
Out[164]:
mins
(05:30:00, 06:30:00] 0.006
(06:30:00, 07:30:00] 0.004
(07:30:00, 08:30:00] 0.003
(08:30:00, 09:30:00] 0.111
(09:30:00, 10:30:00] 0.001
(10:30:00, 11:30:00] 0.002
(11:30:00, 12:30:00] 0.005
(12:30:00, 13:30:00] 0.056
Name: val, dtype: float64
来源DF:
In [166]: df
Out[166]:
val date time
0 0.067 01DEC90 04:00:00
1 0.007 01DEC90 05:00:00
2 0.006 01DEC90 06:00:00
3 0.004 01DEC90 07:00:00
4 0.003 01DEC90 08:00:00
5 0.111 01DEC90 09:00:00
6 0.001 01JAN90 10:00:00
7 0.002 01JAN90 11:00:00
8 0.005 01JAN90 12:00:00
9 0.056 01JAN90 13:00:00
说明:
bins:分钟数
In [181]: bins
Out[181]: array([ 270, 330, 390, 450, 510, 570, 630, 690, 750, 810, 870, 930, 990, 1050, 1110, 1170, 1230])
标签
In [182]: labels
Out[182]:
['(04:30:00, 04:30:00]',
'(05:30:00, 05:30:00]',
'(06:30:00, 06:30:00]',
'(07:30:00, 07:30:00]',
'(08:30:00, 08:30:00]',
'(09:30:00, 09:30:00]',
'(10:30:00, 10:30:00]',
'(11:30:00, 11:30:00]',
'(12:30:00, 12:30:00]',
'(13:30:00, 13:30:00]',
'(14:30:00, 14:30:00]',
'(15:30:00, 15:30:00]',
'(16:30:00, 16:30:00]',
'(17:30:00, 17:30:00]',
'(18:30:00, 18:30:00]',
'(19:30:00, 19:30:00]']