我有这个数据框:
输入:
df= pd.DataFrame({'Date':['2007-01-01 07:14:00','2007-01-01 07:25:00','2007-01-01 08:00:00','2007-01-01 09:14:00','2007-01-01 09:33:12'],'sent':[0.32,0.34,0.45,0.7,0.22],'var1':[44,0.66,0.45,45,0.92], 'var2':['nan','2007-01-01 08:00:00','2007-01-01 09:00:00','nan','2007-01-01 10:00:00']})
输出:
Date sent var1 var2 0 2007-01-01 07:14:00 0.32 44.00 nan 1 2007-01-01 07:25:00 0.34 0.66 2007-01-01 08:00:00 2 2007-01-01 08:00:00 0.45 0.45 2007-01-01 09:00:00 3 2007-01-01 09:14:00 0.70 45.00 nan 4 2007-01-01 09:33:12 0.22 0.92 2007-01-01 10:00:00
我将此代码应用于按每小时来计算sent
列的总和 输入为:
df=df.groupby(pd.Grouper(key='Date', freq='1H')).sent.sum().reset_index()
出:
Date sent 0 2007-01-01 07:00:00 0.66 1 2007-01-01 08:00:00 0.45 2 2007-01-01 09:00:00 0.92
var1
函数保留其他列,如var2
和groupby
。我将不胜感激。 答案 0 :(得分:2)
如果我理解正确,则可以使用transform:
import pandas as pd
df = pd.DataFrame({'Date': pd.to_datetime(['2007-01-01 07:14:00', '2007-01-01 07:25:00', '2007-01-01 08:00:00', '2007-01-01 09:14:00',
'2007-01-01 09:33:12']),
'sent': [0.32, 0.34, 0.45, 0.7, 0.22],
'var1': [44, 0.66, 0.45, 45, 0.92],
'var2': ['nan', '2007-01-01 08:00:00', '2007-01-01 09:00:00', 'nan', '2007-01-01 10:00:00']})
df['sum'] = df.groupby(df.Date.dt.hour).sent.transform('sum')
print(df)
输出
Date sent var1 var2 sum
0 2007-01-01 07:14:00 0.32 44.00 nan 0.66
1 2007-01-01 07:25:00 0.34 0.66 2007-01-01 08:00:00 0.66
2 2007-01-01 08:00:00 0.45 0.45 2007-01-01 09:00:00 0.45
3 2007-01-01 09:14:00 0.70 45.00 nan 0.92
4 2007-01-01 09:33:12 0.22 0.92 2007-01-01 10:00:00 0.92