我有以下数据框:
Date from Date to Actuals
4669 2017-12-22 06:00:00 2017-12-22 06:05:00 75
4670 2017-12-22 06:05:00 2017-12-22 06:10:00 81
4671 2017-12-22 06:10:00 2017-12-22 06:15:00 84
4672 2017-12-22 06:15:00 2017-12-22 06:20:00 78
4673 2017-12-22 06:20:00 2017-12-22 06:25:00 93
4674 2017-12-22 06:25:00 2017-12-22 06:30:00 93
4675 2017-12-22 06:30:00 2017-12-22 06:35:00 99
4676 2017-12-22 06:35:00 2017-12-22 06:40:00 102
4677 2017-12-22 06:40:00 2017-12-22 06:45:00 102
4678 2017-12-22 06:45:00 2017-12-22 06:50:00 108
4679 2017-12-22 06:50:00 2017-12-22 06:55:00 129
4680 2017-12-22 06:55:00 2017-12-22 07:00:00 135
4681 2017-12-22 07:00:00 2017-12-22 07:05:00 126
4682 2017-12-22 07:05:00 2017-12-22 07:10:00 111
4683 2017-12-22 07:10:00 2017-12-22 07:15:00 96
4684 2017-12-22 07:15:00 2017-12-22 07:20:00 111
4685 2017-12-22 07:20:00 2017-12-22 07:25:00 105
4686 2017-12-22 07:25:00 2017-12-22 07:30:00 99
4687 2017-12-22 07:30:00 2017-12-22 07:35:00 111
4688 2017-12-22 07:35:00 2017-12-22 07:40:00 129
4689 2017-12-22 07:40:00 2017-12-22 07:45:00 123
4690 2017-12-22 07:45:00 2017-12-22 07:50:00 138
4691 2017-12-22 07:50:00 2017-12-22 07:55:00 141
4692 2017-12-22 07:55:00 2017-12-22 08:00:00 156
4693 2017-12-22 08:00:00 2017-12-22 08:05:00 147
4694 2017-12-22 08:05:00 2017-12-22 08:10:00 120
4695 2017-12-22 08:10:00 2017-12-22 08:15:00 99
4696 2017-12-22 08:15:00 2017-12-22 08:20:00 75
4697 2017-12-22 08:20:00 2017-12-22 08:25:00 57
4698 2017-12-22 08:25:00 2017-12-22 08:30:00 45
... ... ...
53855 2018-10-08 03:30:00 2018-10-08 03:35:00 0
53856 2018-10-08 03:35:00 2018-10-08 03:40:00 0
53857 2018-10-08 03:40:00 2018-10-08 03:45:00 0
53858 2018-10-08 03:45:00 2018-10-08 03:50:00 0
53859 2018-10-08 03:50:00 2018-10-08 03:55:00 0
53860 2018-10-08 03:55:00 2018-10-08 04:00:00 0
53861 2018-10-08 04:00:00 2018-10-08 04:05:00 0
53862 2018-10-08 04:05:00 2018-10-08 04:10:00 0
53863 2018-10-08 04:10:00 2018-10-08 04:15:00 0
53864 2018-10-08 04:15:00 2018-10-08 04:20:00 0
53865 2018-10-08 04:20:00 2018-10-08 04:25:00 0
53866 2018-10-08 04:25:00 2018-10-08 04:30:00 0
53867 2018-10-08 04:30:00 2018-10-08 04:35:00 0
53868 2018-10-08 04:35:00 2018-10-08 04:40:00 0
53869 2018-10-08 04:40:00 2018-10-08 04:45:00 0
53870 2018-10-08 04:45:00 2018-10-08 04:50:00 0
53871 2018-10-08 04:50:00 2018-10-08 04:55:00 0
53872 2018-10-08 04:55:00 2018-10-08 05:00:00 0
53873 2018-10-08 05:00:00 2018-10-08 05:05:00 0
53874 2018-10-08 05:05:00 2018-10-08 05:10:00 0
53875 2018-10-08 05:10:00 2018-10-08 05:15:00 0
53876 2018-10-08 05:15:00 2018-10-08 05:20:00 0
53877 2018-10-08 05:20:00 2018-10-08 05:25:00 0
53878 2018-10-08 05:25:00 2018-10-08 05:30:00 0
53879 2018-10-08 05:30:00 2018-10-08 05:35:00 0
53880 2018-10-08 05:35:00 2018-10-08 05:40:00 0
53881 2018-10-08 05:40:00 2018-10-08 05:45:00 0
53882 2018-10-08 05:45:00 2018-10-08 05:50:00 0
53883 2018-10-08 05:50:00 2018-10-08 05:55:00 1
53884 2018-10-08 05:55:00 2018-10-08 06:00:00 0
[83324 rows x 3 columns]
我想添加行,以便获得每小时的累计值。所需结果:
Date from Date to Actuals
1 2017-12-22 06:00:00 2017-12-22 07:00:00 1179
2 2017-12-22 07:00:00 2017-12-22 08:00:00 1157
... ... ...
1000 2018-10-08 05:00:00 2018-10-08 06:00:00 1
我使用DataFrame.sum()
进行了尝试,但是我只能在对整个列求和而不是基于datetime
的子部分求和时进行此操作。有什么建议么?
ps:在这种情况下,数据框中每5分钟有一行。但是我可以想象,如果不是这样,这应该是可能的。
编辑:使用Statistic Dean的答案,我发现这不是一个完美填充的数据框。
答案 0 :(得分:3)
一个简单的方法(尽管输出的结构与您要的结构不完全相同,但是很容易操纵)是使用pandas.Grouper
到groupby
小时,然后求和实际值,即
import pandas
import random
#Creating the data frame
d = pandas.date_range('2017-12-22 06:00:00', periods = 50, freq = '5min')
d1 = pandas.date_range('2017-12-22 06:05:00', periods = 50, freq = '5min')
d2 = random.sample(range(1000), 50)
df = pandas.DataFrame({'Date_From':d,
'Date_To':d1,
'Actuals':d2})
(df
.set_index('Date_From')
.groupby(pandas.Grouper(freq = 'H'))['Actuals']
.sum())
给出,
Date_From 2017-12-22 06:00:00 5194 2017-12-22 07:00:00 5790 2017-12-22 08:00:00 5760 2017-12-22 09:00:00 6298 2017-12-22 10:00:00 1070 Freq: H, Name: Actuals, dtype: int64
答案 1 :(得分:0)
您可以注意到的一件事是您一次必须累加12个术语。因此,一种解决方案是遍历您的数据框,一次累加12个术语,从第一个术语开始,最后一个术语停止。您只需要注意边界。我们称您的数据框为df。
n = df.shape[0]//12 # The number of row you'll have
cumulative = np.zeros(n)
date_from = []
date_to = []
# Now go through the dataframe 12 steps at a time
for i in range(n):
cumulative[i] = df.iloc[12*i:12*(i+1),2].sum() # Get the sum for the hour
date_from.append(df.iloc[12*i,0]) # Get the starting instant
date_to.append(df.iloc[12*i+11,1]) # Get the ending instant
# Now create your new dataframe
new_df = pd.DataFrame({Date_from: date_from, Date_to: date_to, Actuals: cumulative})
正如我之前所说,这项工作只能在正确的边界(第一行是一个小时的开始)进行,直到最后一个完整的小时。