我的数据框如下所示:
read value
0 2013-01-07 05:00:00 29.0
1 2013-01-08 15:00:00 4034.0
2 2013-01-09 20:00:00 256340.0
3 2013-01-10 20:00:00 343443.0
4 2013-01-11 20:00:00 4642435.0
5 2013-01-12 15:00:00 544296.0
6 2013-01-13 20:00:00 700000.0
7 2013-01-14 20:00:00 782335.0
8 2013-01-15 19:00:00 900000.0
9 2013-01-16 20:00:00 959130.0
10 2013-01-17 19:00:00 1114343.0
11 2013-01-18 20:00:00 1146230.0
12 2013-01-19 20:00:00 1247793.0
13 2013-01-20 20:00:00 1343376.0
我想将其转换并进行标准化,以便显示随时间推移的每小时消耗量。到目前为止,我有以下
import numpy as np
import pandas as pd
#caluclates hourly delta
current['hour_delta'] = (current['read'] - current['read'].shift()).fillna(0).astype('timedelta64[h]')
#adds end date and then amount per hours
current['end_date'] = current['read'] + pd.to_timedelta(current['hour_delta'], unit='h')
current['infer_hour'] = current['value'] / current['hour_delta']
然后我创建了系列
#create hourly time series
result = pd.Series(0, index=pd.date_range(start=current['read'].min(), end=current['read'].max(), freq='h'))
然而,从这里开始,我无法弄清楚如何将小时费率应用于该系列。
答案 0 :(得分:3)
您可以在read
列上每小时重新采样一次。然后插值以填充空值。然后将每行的差异与下一行区分开来。
例如,2013-01-07 05:00:00
和2013-01-08 15:00:00
之间有34小时。如果我必须在34小时内分发4034
,那么每小时的平均值应为4034 / 34
或118.647059
current.set_index('read').value.cumsum().resample('H').sum().interpolate().diff()
read
2013-01-07 05:00:00 NaN
2013-01-07 06:00:00 118.647059
2013-01-07 07:00:00 118.647059
2013-01-07 08:00:00 118.647059
2013-01-07 09:00:00 118.647059
2013-01-07 10:00:00 118.647059
2013-01-07 11:00:00 118.647059
2013-01-07 12:00:00 118.647059
2013-01-07 13:00:00 118.647059
2013-01-07 14:00:00 118.647059
2013-01-07 15:00:00 118.647059
2013-01-07 16:00:00 118.647059
2013-01-07 17:00:00 118.647059
2013-01-07 18:00:00 118.647059
2013-01-07 19:00:00 118.647059
...