应用熊猫组功能保留其他列

时间:2018-12-26 23:48:07

标签: python pandas pandas-groupby

我有这个数据框: 输入:

  

df= pd.DataFrame({'Date':['2007-01-01 07:14:00','2007-01-01 07:25:00','2007-01-01 08:00:00','2007-01-01 09:14:00','2007-01-01 09:33:12'],'sent':[0.32,0.34,0.45,0.7,0.22],'var1':[44,0.66,0.45,45,0.92], 'var2':['nan','2007-01-01 08:00:00','2007-01-01 09:00:00','nan','2007-01-01 10:00:00']})   输出:
  Date sent var1 var2 0 2007-01-01 07:14:00 0.32 44.00 nan 1 2007-01-01 07:25:00 0.34 0.66 2007-01-01 08:00:00 2 2007-01-01 08:00:00 0.45 0.45 2007-01-01 09:00:00 3 2007-01-01 09:14:00 0.70 45.00 nan 4 2007-01-01 09:33:12 0.22 0.92 2007-01-01 10:00:00   
  我将此代码应用于按每小时来计算sent列的总和   输入为:
  df=df.groupby(pd.Grouper(key='Date', freq='1H')).sent.sum().reset_index()   
  出:
  Date sent 0 2007-01-01 07:00:00 0.66 1 2007-01-01 08:00:00 0.45 2 2007-01-01 09:00:00 0.92   

  • 问题是:我想使用var1函数保留其他列,如var2groupby。我将不胜感激。

1 个答案:

答案 0 :(得分:2)

如果我理解正确,则可以使用transform

import pandas as pd

df = pd.DataFrame({'Date': pd.to_datetime(['2007-01-01 07:14:00', '2007-01-01 07:25:00', '2007-01-01 08:00:00', '2007-01-01 09:14:00',
                            '2007-01-01 09:33:12']),
                   'sent': [0.32, 0.34, 0.45, 0.7, 0.22],
                   'var1': [44, 0.66, 0.45, 45, 0.92],
                   'var2': ['nan', '2007-01-01 08:00:00', '2007-01-01 09:00:00', 'nan', '2007-01-01 10:00:00']})


df['sum'] = df.groupby(df.Date.dt.hour).sent.transform('sum')

print(df)

输出

                 Date  sent   var1                 var2   sum
0 2007-01-01 07:14:00  0.32  44.00                  nan  0.66
1 2007-01-01 07:25:00  0.34   0.66  2007-01-01 08:00:00  0.66
2 2007-01-01 08:00:00  0.45   0.45  2007-01-01 09:00:00  0.45
3 2007-01-01 09:14:00  0.70  45.00                  nan  0.92
4 2007-01-01 09:33:12  0.22   0.92  2007-01-01 10:00:00  0.92