我有按小时划分的时间序列数据,我正尝试按标识进行分组,然后对从04:00开始的24小时内的数量取平均值
我创建了一个数据样本:
data= {'Identification' : pd.Series(['21Z0000000004774', '21Z0000000004774', '21Z0000000004774', '21Z0000000001111','21Z0000000001111','21Z0000000001111','21Z0000000005000','21Z0000000005000','21Z0000000005000']),
'Quantity' : pd.Series([1, 2, 3, 10, 10, 10, 4, 3, 2]),
'StartDate' : pd.to_datetime(['06/09/2019 04:00:00', '2019-09-06 05:00:00', '2019-09-06 06:00:00', '06/09/2019 04:00:00', '2019-09-06 05:00:00', '2019-09-06 06:00:00','06/09/2019 04:00:00', '2019-09-06 05:00:00', '2019-09-06 06:00:00']),
'EndDate' : pd.to_datetime(['2019-09-06 05:00:00', '2019-09-06 06:00:00', '2019-09-06 07:00:00', '2019-09-06 05:00:00', '2019-09-06 06:00:00', '2019-09-06 07:00:00','2019-09-06 05:00:00', '2019-09-06 06:00:00', '2019-09-06 07:00:00']),
'Direction' : pd.Series(['Z02', 'Z02', 'Z02', 'Z02','Z02','Z02','Z02','Z02','Z02'])}
df = pd.DataFrame(data, columns = ['Identification','Quantity','StartDate', 'EndDate','Direction'])
我尝试将这两个Stackoverflow示例中的代码组合在一起,但无法使其工作:( Resample hourly TimeSeries with certain starting hour 和 Resampling a pandas dataframe with multi-index containing timeseries
我尝试过:
def resampler(x):
return x.set_index(['StartDate', 'Identification']).resample(rule='24H', base=4,).mean()
df.reset_index(level=1).groupby(level=0).apply(resampler)
但出现以下错误:
ValueError:级别> 0或级别<-1仅对MultiIndex有效
以下代码行平均了我的数据,但忽略了不同的识别码:
df.resample(rule='24H', closed='left', label='left', base=4, level=0).mean()
任何帮助将不胜感激。