在熊猫中使用multiIndex进行分组和重新采样

时间:2019-12-18 22:35:07

标签: python pandas

我有按小时划分的时间序列数据,我正尝试按标识进行分组,然后对从04:00开始的24小时内的数量取平均值

我创建了一个数据样本:

data= {'Identification' : pd.Series(['21Z0000000004774', '21Z0000000004774', '21Z0000000004774', '21Z0000000001111','21Z0000000001111','21Z0000000001111','21Z0000000005000','21Z0000000005000','21Z0000000005000']),
   'Quantity' : pd.Series([1, 2, 3, 10, 10, 10, 4, 3, 2]),
   'StartDate' : pd.to_datetime(['06/09/2019  04:00:00', '2019-09-06 05:00:00', '2019-09-06 06:00:00', '06/09/2019  04:00:00', '2019-09-06 05:00:00', '2019-09-06 06:00:00','06/09/2019  04:00:00', '2019-09-06 05:00:00', '2019-09-06 06:00:00']),
   'EndDate' : pd.to_datetime(['2019-09-06 05:00:00', '2019-09-06 06:00:00', '2019-09-06 07:00:00', '2019-09-06 05:00:00', '2019-09-06 06:00:00', '2019-09-06 07:00:00','2019-09-06 05:00:00', '2019-09-06 06:00:00', '2019-09-06 07:00:00']), 
   'Direction' : pd.Series(['Z02', 'Z02', 'Z02', 'Z02','Z02','Z02','Z02','Z02','Z02'])} 

df = pd.DataFrame(data, columns = ['Identification','Quantity','StartDate', 'EndDate','Direction'])

我尝试将这两个Stackoverflow示例中的代码组合在一起,但无法使其工作:( Resample hourly TimeSeries with certain starting hourResampling a pandas dataframe with multi-index containing timeseries

我尝试过:

def resampler(x):    
return x.set_index(['StartDate', 'Identification']).resample(rule='24H', base=4,).mean()

df.reset_index(level=1).groupby(level=0).apply(resampler)

但出现以下错误:

  

ValueError:级别> 0或级别<-1仅对MultiIndex有效

以下代码行平均了我的数据,但忽略了不同的识别码:

df.resample(rule='24H', closed='left', label='left', base=4, level=0).mean()

任何帮助将不胜感激。

0 个答案:

没有答案