我的DataFrame看起来像这样
df= pd.DataFrame({'Date':['2007-01-01 07:14:00','2007-01-01 07:25:00','2007-01-01 08:00:00', '2007-01-01 09:14:00','2007-01-01 09:33:12'],'sent':[0.32,0.34,0.45,0.7,0.22]})
现在,我想基于小时日期范围添加新列sum
,例如,将2007-01-01 07:00:00
添加到2007-01-01 08:00:00
列sum
= 0.32+0.34= 0.66
中。接下来的一个小时2007-01-01 08:00:00
至2007-01-01 09:00:00
的列sum= 0.45
和第三小时的2007-01-01 09:00:00
至2007-01-01 10:00:00
的列sum= 0.7+0.22= 0.92
谢谢。
我想要的输出是:
df= pd.DataFrame({'Date':['2007-01-01 07:14:00','2007-01-01 07:25:00','2007-01-01 08:00:00','2007-01-01 09:14:00','2007-01-01 09:33:12'],'sent':0.32,0.34,0.45,0.7,0.22],'sum':['na',0.66,0.45,'na',0.92],'Datehour':['nan','2007-01-01 08:00:00','2007-01-01 09:00:00','nan','2007-01-01 10:00:00']})
答案 0 :(得分:2)
使用pd.Grouper
并按1H间隔分组:
# If necessary, convert to datetime.
# df.Date = pd.to_datetime(df.Date, errors='coerce')
df.groupby(pd.Grouper(key='Date', freq='1H')).sent.sum().reset_index()
Date sent
0 2007-01-01 07:00:00 0.66
1 2007-01-01 08:00:00 0.45
2 2007-01-01 09:00:00 0.92
另一个选项是重新采样:
df.set_index('Date').resample('1H').sum().reset_index()
Date sent
0 2007-01-01 07:00:00 0.66
1 2007-01-01 08:00:00 0.45
2 2007-01-01 09:00:00 0.92