我有一个pandas数据帧,其中列的速度单位为KmH,还有一列时间戳:
Date, Speed
2016-07-07 13:38:02.000, 50.718590
2016-07-18 11:28:00.000, 2.357645
2016-07-15 15:03:08.000, 14.652172
2016-07-18 06:53:00.000, 24.530390
... ...
2016-07-18 18:41:31.000, 31.761416
2016-07-14 05:28:42.187, 7.532758
我想要的是平均每天15分钟harmonic average speed:
Time, Speed
00:00, 32
00:15, 10
00:30, 12
00:45, 41
01:00, 12
...
23:30, 30
23:45, 31
我最初的尝试是从每个时间戳中删除日期,将其设置为索引,然后使用TimeGrouper查找平均值。 (我的数据框称为输出)代码是:
output['Speed'] = output['Speed']**-1
output['Date'] = output['Date'].apply( lambda d : d.time() )
output = output.set_index(['Date'])
output = output.groupby(pd.TimeGrouper('15Min')).mean()
output['Speed'] = output['Speed']**-1
但是代码没有用,因为它给了我一个错误:
Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'
答案 0 :(得分:2)
我认为您要做的是规范化日期,然后进行重新取样:
In [177]:
df['Date'] = pd.to_datetime(df['Date'].dt.strftime('%H:%M:%S'))
df
Out[177]:
Date Speed
0 2017-02-07 13:38:02 50.718590
1 2017-02-07 11:28:00 2.357645
2 2017-02-07 15:03:08 14.652172
3 2017-02-07 06:53:00 24.530390
4 2017-02-07 18:41:31 31.761416
5 2017-02-07 05:28:42 7.532758
现在所有日期都是相同的,默认情况下是今天的日期,然后按照自己的意愿行事:
In [178]:
output = df.set_index('Date')
output = output.groupby(pd.TimeGrouper('15Min')).mean()
output['Speed'] = output['Speed']**-1
output
Out[178]:
Speed
Date
2017-02-07 05:15:00 0.132754
2017-02-07 05:30:00 NaN
2017-02-07 05:45:00 NaN
2017-02-07 06:00:00 NaN
2017-02-07 06:15:00 NaN
2017-02-07 06:30:00 NaN
2017-02-07 06:45:00 0.040766
2017-02-07 07:00:00 NaN
2017-02-07 07:15:00 NaN
2017-02-07 07:30:00 NaN
2017-02-07 07:45:00 NaN
2017-02-07 08:00:00 NaN
2017-02-07 08:15:00 NaN
2017-02-07 08:30:00 NaN
2017-02-07 08:45:00 NaN
2017-02-07 09:00:00 NaN
2017-02-07 09:15:00 NaN
2017-02-07 09:30:00 NaN
2017-02-07 09:45:00 NaN
2017-02-07 10:00:00 NaN
2017-02-07 10:15:00 NaN
2017-02-07 10:30:00 NaN
2017-02-07 10:45:00 NaN
2017-02-07 11:00:00 NaN
2017-02-07 11:15:00 0.424152
2017-02-07 11:30:00 NaN
2017-02-07 11:45:00 NaN
2017-02-07 12:00:00 NaN
2017-02-07 12:15:00 NaN
2017-02-07 12:30:00 NaN
2017-02-07 12:45:00 NaN
2017-02-07 13:00:00 NaN
2017-02-07 13:15:00 NaN
2017-02-07 13:30:00 0.019717
2017-02-07 13:45:00 NaN
2017-02-07 14:00:00 NaN
2017-02-07 14:15:00 NaN
2017-02-07 14:30:00 NaN
2017-02-07 14:45:00 NaN
2017-02-07 15:00:00 0.068249
2017-02-07 15:15:00 NaN
2017-02-07 15:30:00 NaN
2017-02-07 15:45:00 NaN
2017-02-07 16:00:00 NaN
2017-02-07 16:15:00 NaN
2017-02-07 16:30:00 NaN
2017-02-07 16:45:00 NaN
2017-02-07 17:00:00 NaN
2017-02-07 17:15:00 NaN
2017-02-07 17:30:00 NaN
2017-02-07 17:45:00 NaN
2017-02-07 18:00:00 NaN
2017-02-07 18:15:00 NaN
2017-02-07 18:30:00 0.031485
所以这个:
df['Date'] = pd.to_datetime(df['Date'].dt.strftime('%H:%M:%S'))
这样做的目的是使用dt.strftime
提取字符串,我们可以使用to_datetime
创建一个datetime64系列,其中所有日期都相同