Question

我有按小时划分的时间序列数据，我正尝试按标识进行分组，然后对从04:00开始的24小时内的数量取平均值

我创建了一个数据样本：

data= {'Identification' : pd.Series(['21Z0000000004774', '21Z0000000004774', '21Z0000000004774', '21Z0000000001111','21Z0000000001111','21Z0000000001111','21Z0000000005000','21Z0000000005000','21Z0000000005000']),
   'Quantity' : pd.Series([1, 2, 3, 10, 10, 10, 4, 3, 2]),
   'StartDate' : pd.to_datetime(['06/09/2019  04:00:00', '2019-09-06 05:00:00', '2019-09-06 06:00:00', '06/09/2019  04:00:00', '2019-09-06 05:00:00', '2019-09-06 06:00:00','06/09/2019  04:00:00', '2019-09-06 05:00:00', '2019-09-06 06:00:00']),
   'EndDate' : pd.to_datetime(['2019-09-06 05:00:00', '2019-09-06 06:00:00', '2019-09-06 07:00:00', '2019-09-06 05:00:00', '2019-09-06 06:00:00', '2019-09-06 07:00:00','2019-09-06 05:00:00', '2019-09-06 06:00:00', '2019-09-06 07:00:00']), 
   'Direction' : pd.Series(['Z02', 'Z02', 'Z02', 'Z02','Z02','Z02','Z02','Z02','Z02'])} 

df = pd.DataFrame(data, columns = ['Identification','Quantity','StartDate', 'EndDate','Direction'])

我尝试将这两个Stackoverflow示例中的代码组合在一起，但无法使其工作:( Resample hourly TimeSeries with certain starting hour 和 Resampling a pandas dataframe with multi-index containing timeseries

我尝试过：

def resampler(x):    
return x.set_index(['StartDate', 'Identification']).resample(rule='24H', base=4,).mean()

df.reset_index(level=1).groupby(level=0).apply(resampler)

但出现以下错误：

ValueError：级别> 0或级别<-1仅对MultiIndex有效

以下代码行平均了我的数据，但忽略了不同的识别码：

df.resample(rule='24H', closed='left', label='left', base=4, level=0).mean()

任何帮助将不胜感激。

在熊猫中使用multiIndex进行分组和重新采样

0 个答案: