作为here我需要计算colums持续时间和km的平均值 值== 1且值= 0的行。 这次我希望聚合期是灵活的。
df
Out[20]:
Date duration km value
0 2015-03-28 09:07:00.800001 0 0 0
1 2015-03-28 09:36:01.819998 1 2 1
2 2015-03-30 09:36:06.839997 1 3 1
3 2015-03-30 09:37:27.659997 nan 5 0
4 2015-04-22 09:51:40.440003 3 7 0
5 2015-04-23 10:15:25.080002 0 nan 1
对于1天的聚合期,我可以使用之前建议的解决方案:
df.pivot_table(values=['duration','km'],columns=['value'],index=df['Date'].dt.date,aggfunc='mean'
ndf.columns = [i[0]+str(i[1]) for i in ndf.columns]
duration0 duration1 km0 km1
Date
2015-03-28 0.0 1.0 0.0 2.0
2015-03-30 NaN 1.0 5.0 3.0
2015-04-22 3.0 NaN 7.0 NaN
2015-04-23 NaN 0.0 NaN NaN
但是,我不知道如何更改聚合期间,例如,我想将其作为函数的参数传递...
出于这个原因,pd.Grouper(freq=freq_aggregation)
,freq_aggregation = 'd'
或'60s'
的方法将首选......
答案 0 :(得分:1)
让我们使用pd.Grouper
,unstack
和列映射:
freq_str = '60s'
df_out = df.groupby([pd.Grouper(freq=freq_str, key='Date'),'value'])['duration','km'].agg('mean').unstack()
df_out.columns = df_out.columns.map('{0[0]}{0[1]}'.format)
df_out
输出:
duration0 duration1 km0 km1
Date
2015-03-28 09:07:00 0.0 NaN 0.0 NaN
2015-03-28 09:36:00 NaN 1.0 NaN 2.0
2015-03-30 09:36:00 NaN 1.0 NaN 3.0
2015-03-30 09:37:00 NaN NaN 5.0 NaN
2015-04-22 09:51:00 3.0 NaN 7.0 NaN
2015-04-23 10:15:00 NaN 0.0 NaN NaN
现在,让我们将freq_str改为' D':
freq_str = 'D'
df_out = df.groupby([pd.Grouper(freq=freq_str, key='Date'),'value'])['duration','km'].agg('mean').unstack()
df_out.columns = df_out.columns.map('{0[0]}{0[1]}'.format)
print(df_out)
输出:
duration0 duration1 km0 km1
Date
2015-03-28 0.0 1.0 0.0 2.0
2015-03-30 NaN 1.0 5.0 3.0
2015-04-22 3.0 NaN 7.0 NaN
2015-04-23 NaN 0.0 NaN NaN
答案 1 :(得分:1)
您可以将石斑鱼传递到数据透视表的索引。希望这就是你要找的东西,即
ndf = df.pivot_table(values=['duration','km'],columns=['value'],index=pd.Grouper(key='Date', freq='60s'),aggfunc='mean')
ndf.columns = [i[0]+str(i[1]) for i in ndf.columns]
输出:
duration0 duration1 km0 km1 Date 2015-03-28 09:07:00 0.0 NaN 0.0 NaN 2015-03-28 09:36:00 NaN 1.0 NaN 2.0 2015-03-30 09:36:00 NaN 1.0 NaN 3.0 2015-03-30 09:37:00 NaN NaN 5.0 NaN 2015-04-22 09:51:00 3.0 NaN 7.0 NaN 2015-04-23 10:15:00 NaN 0.0 NaN NaN
如果频率为D
则
duration0 duration1 km0 km1 Date 2015-03-28 0.0 1.0 0.0 2.0 2015-03-30 NaN 1.0 5.0 3.0 2015-04-22 3.0 NaN 7.0 NaN 2015-04-23 NaN 0.0 NaN NaN
答案 2 :(得分:0)
使用groupby
df = df.set_index('Date')
df.groupby([pd.TimeGrouper('D'), 'value']).mean()
duration km
Date value
2017-10-11 0 1.500000 4.0
1 0.666667 2.5
df.groupby([pd.TimeGrouper('60s'), 'value']).mean()
duration km
Date value
2017-10-11 09:07:00 0 0.0 0.0
2017-10-11 09:36:00 1 1.0 2.5
2017-10-11 09:37:00 0 NaN 5.0
2017-10-11 09:51:00 0 3.0 7.0
2017-10-11 10:15:00 1 0.0 NaN
如果你想要它取消堆叠,那么将它拆开。
df.groupby([pd.TimeGrouper('D'), 'value']).mean().unstack()
duration km
value 0 1 0 1
Date
2017-10-11 1.50 0.67 4.00 2.50