我有一组数据集,如下所示。它有一个开始时间和结束时间。对于每一行,都有相应的值。
Block_start Block_end Total Coal Waste
01/20/2016 5:00 01/20/2016 5:23 1284 0 1284
01/20/2016 5:23 01/20/2016 6:44 5755 0 5755
01/20/2016 6:44 01/20/2016 8:21 8058 0 8058
01/20/2016 8:21 01/20/2016 10:04 8584 0 8584
01/20/2016 10:04 01/20/2016 11:49 8790 0 8790
01/20/2016 11:49 01/20/2016 12:58 3437 0 3437
01/20/2016 12:58 01/20/2016 16:52 19532 0 19532
01/20/2016 16:52 01/20/2016 21:15 21925 0 21925
01/20/2016 21:15 01/21/2016 1:47 22636 0 22636
01/21/2016 1:47 01/21/2016 5:07 16701 0 16701
01/21/2016 5:07 01/21/2016 11:55 10205 0 10205
01/21/2016 11:55 01/21/2016 17:07 25965 0 25965
01/21/2016 17:07 01/21/2016 22:09 25188 0 25188
01/21/2016 22:09 01/22/2016 3:41 27666 0 27666
01/22/2016 3:41 01/22/2016 8:01 21698 0 21698
01/22/2016 8:01 01/22/2016 15:34 11315 0 11315
01/22/2016 15:34 01/22/2016 19:55 21778 0 21778
01/22/2016 19:55 01/23/2016 0:25 22481 0 22481
...
我希望将这些值与每8小时的频率相加,并使用' left'标签和开始时间是凌晨5点。 我在' Block_end'上设置了索引。并尝试重新取样。 我试过了:
df.set_index('Block_end')
df_resamped = df.resample('8H', closed='left', label='left', base=5).sum()
但结果(如下)不是我想要的。
Block_end Total Coal Waste
2016-01-20 13:00:00 35908 0 35908
2016-01-20 21:00:00 19532 0 19532
2016-01-21 05:00:00 44561 0 44561
2016-01-21 13:00:00 26906 0 26906
2016-01-21 21:00:00 25965 0 25965
2016-01-22 05:00:00 52854 0 52854
2016-01-22 13:00:00 21698 0 21698
2016-01-22 21:00:00 33093 0 33093
2016-01-23 05:00:00 44774 0 44774
...
我想要像01/20/2016 21:15这样的重叠,15分钟用于之后和之前的其余部分,但是熊猫不会这样做。它是一种插值。
答案 0 :(得分:0)
不确定所需的结果,但我相信如果您只想在所需范围的值之间进行插值,则无需进行总和。
从这个DataFrame开始(确保Block_end是你的DatetimeIndex)
df
Out[175]:
Block_start Total Coal Waste
Block_end
2016-01-20 05:23:00 01/20/2016 5:00 1284 0 1284
2016-01-20 06:44:00 01/20/2016 5:23 5755 0 5755
2016-01-20 08:21:00 01/20/2016 6:44 8058 0 8058
2016-01-20 10:04:00 01/20/2016 8:21 8584 0 8584
2016-01-20 11:49:00 01/20/2016 10:04 8790 0 8790
2016-01-20 12:58:00 01/20/2016 11:49 3437 0 3437
2016-01-20 16:52:00 01/20/2016 12:58 19532 0 19532
2016-01-20 21:15:00 01/20/2016 16:52 21925 0 21925
2016-01-21 01:47:00 01/20/2016 21:15 22636 0 22636
2016-01-21 05:07:00 01/21/2016 1:47 16701 0 16701
2016-01-21 11:55:00 01/21/2016 5:07 10205 0 10205
2016-01-21 17:07:00 01/21/2016 11:55 25965 0 25965
2016-01-21 22:09:00 01/21/2016 17:07 25188 0 25188
2016-01-22 03:41:00 01/21/2016 22:09 27666 0 27666
2016-01-22 08:01:00 01/22/2016 3:41 21698 0 21698
2016-01-22 15:34:00 01/22/2016 8:01 11315 0 11315
2016-01-22 19:55:00 01/22/2016 15:34 21778 0 21778
2016-01-23 00:25:00 01/22/2016 19:55 22481 0 22481
首先定义所需的结果范围:
rng = pd.date_range(start=pd.Timestamp('2016-01-20 13:00'), end=pd.Timestamp('2016-01-22 21:00'), freq='8 h')
然后重新取样您的DataFrame每分钟,使用DataFrame.interpolate(),然后使用所需范围重新索引
df_resamped = df.resample('min').interpolate().reindex(rng)
df_resamped
Out[178]:
Total Coal Waste
2016-01-20 13:00:00 3574.564103 0 3574.564103
2016-01-20 21:00:00 21788.517110 0 21788.517110
2016-01-21 05:00:00 16908.725000 0 16908.725000
2016-01-21 13:00:00 13488.333333 0 13488.333333
2016-01-21 21:00:00 25365.526490 0 25365.526490
2016-01-22 05:00:00 25852.646154 0 25852.646154
2016-01-22 13:00:00 14844.761589 0 14844.761589
2016-01-22 21:00:00 21947.240741 0 21947.240741