我有一个数据框,我想计算所有列的每12小时平均值。 数据框具有超过20万行。
DateTime Speed TRQ ... PtoP3 RMS3 Crest3
0 2016-07-01 00:00 994 35.4 ... NA NA NA
1 2016-07-01 00:01 995 34.6 ... NA NA NA
2 2016-07-01 00:02 995 34 ... NA NA NA
我写了这个
Present_data.to_datetime(Present_data['DateTime'])
Total_12hravg_all = Present_data.groupby(pd.Grouper(freq='12H', key='DateTime')).mean()
print(Total_12hravg_all)
并收到此错误
TypeError:仅与DatetimeIndex,TimedeltaIndex或 PeriodIndex,但有一个“索引”的实例
答案 0 :(得分:1)
如果Datetime
是列:
您的解决方案应该运作良好:
Present_data['DateTime'] = pd.to_datetime(Present_data['DateTime'])
Total_12hravg_all = Present_data.groupby(pd.Grouper(freq='12H', key='DateTime')).mean()
另一种解决方案是将resample
与参数on
一起使用:
Present_data['DateTime'] = pd.to_datetime(Present_data['DateTime'])
Total_12hravg_all = Present_data.resample('12H', on='DateTime').mean()
或创建DatetimeIndex
:
Present_data['DateTime'] = pd.to_datetime(Present_data['DateTime'])
Present_data = Present_data.set_index('DateTime')
Total_12hravg_all = Present_data.groupby(pd.Grouper(freq='12H')).mean()
#resample
#Total_12hravg_all = Present_data.resample('12H').mean()
如果Datetime
是索引:
Present_data.index = pd.to_datetime(Present_data.index)
Total_12hravg_all = Present_data.groupby(pd.Grouper(freq='12H')).mean()
#resample
#Total_12hravg_all = Present_data.resample('12H').mean()
最终解决方案:
Present_data['DateTime'] = pd.to_datetime(Present_data['DateTime'])
Present_data = Present_data.set_index('DateTime')
#convert non numeri values to NaNs
Present_data = Present_data.apply(lambda x: pd.to_numeric(x, errors='coerce'))
Total_12hravg_all = Present_data.groupby(pd.Grouper(freq='12H')).mean()
#resample
#Total_12hravg_all = Present_data.resample('12H').mean()