根据日期范围计算总和

时间:2018-12-26 13:38:23

标签: python-3.x pandas

我的DataFrame看起来像这样

df= pd.DataFrame({'Date':['2007-01-01 07:14:00','2007-01-01 07:25:00','2007-01-01 08:00:00', '2007-01-01 09:14:00','2007-01-01 09:33:12'],'sent':[0.32,0.34,0.45,0.7,0.22]})

现在,我想基于小时日期范围添加新列sum,例如,将2007-01-01 07:00:00添加到2007-01-01 08:00:00sum = 0.32+0.34= 0.66中。接下来的一个小时2007-01-01 08:00:002007-01-01 09:00:00的列sum= 0.45和第三小时的2007-01-01 09:00:002007-01-01 10:00:00的列sum= 0.7+0.22= 0.92谢谢。 我想要的输出是:

df= pd.DataFrame({'Date':['2007-01-01 07:14:00','2007-01-01 07:25:00','2007-01-01 08:00:00','2007-01-01 09:14:00','2007-01-01 09:33:12'],'sent':0.32,0.34,0.45,0.7,0.22],'sum':['na',0.66,0.45,'na',0.92],'Datehour':['nan','2007-01-01 08:00:00','2007-01-01 09:00:00','nan','2007-01-01 10:00:00']})  

1 个答案:

答案 0 :(得分:2)

使用pd.Grouper并按1H间隔分组:

# If necessary, convert to datetime.
# df.Date = pd.to_datetime(df.Date, errors='coerce')
df.groupby(pd.Grouper(key='Date', freq='1H')).sent.sum().reset_index()

                 Date  sent
0 2007-01-01 07:00:00  0.66
1 2007-01-01 08:00:00  0.45
2 2007-01-01 09:00:00  0.92

另一个选项是重新采样:

df.set_index('Date').resample('1H').sum().reset_index()

                 Date  sent
0 2007-01-01 07:00:00  0.66
1 2007-01-01 08:00:00  0.45
2 2007-01-01 09:00:00  0.92