我正在做一些电力负荷预测,需要将数据集拆分为自定义的训练和测试集。为此,我有3年的电力负荷数据,从01-01-2015 00:30:00到01-01-2018 00:00:00。每半小时记录一次负载。现在,我想将训练和评估期分为312组,连续84小时(36 + 48),并用它们来定义5倍验证划分。
人们可以注意到,奇数集从00:00开始,偶数集从12:00开始。评估期间缺少斑点也会发生这种情况。对于5倍拆分,必须像这样完成:
我对数据集的划分形成验证折叠感到震惊。我尝试使用sklearn的k倍和时间序列拆分,但没有得到我想要的。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#Data pre-processing
state = {0: 'NSW', 1: 'QLD', 2: 'SA', 3: 'TAS', 4: 'VIC'}
year = {0: '2015', 1: '2016', 2: '2017'}
#year = {0: '2017'}
df_nsw = pd.DataFrame()
df_qld = pd.DataFrame()
df_sa = pd.DataFrame()
df_tas = pd.DataFrame()
df_vic = pd.DataFrame()
df_nsw_test = pd.DataFrame()
df_qld_test = pd.DataFrame()
df_sa_test = pd.DataFrame()
df_tas_test = pd.DataFrame()
df_vic_test = pd.DataFrame()
df = {'NSW': df_nsw, 'QLD': df_qld, 'SA': df_sa, 'TAS': df_tas, 'VIC': df_vic}
df_test = {'NSW': df_nsw_test, 'QLD': df_qld_test, 'SA': df_sa_test, 'TAS': df_tas_test, 'VIC': df_vic_test}
for st in state.values():
for ye in year.values():
for mn in range(1,13):
if mn < 10:
dataset = pd.read_csv('./datasets/train/' + st + '/PRICE_AND_DEMAND_' + ye + '0' + str(mn) +'_' + st + '1.csv')
else:
dataset = pd.read_csv('./datasets/train/' + st + '/PRICE_AND_DEMAND_' + ye + str(mn) +'_' + st + '1.csv')
df[st] = df[st].append(dataset.iloc[:,1:3])
df[st] = df[st].set_index('SETTLEMENTDATE')
for st in state.values():
dataset = pd.read_csv('./datasets/test/' + st + '/PRICE_AND_DEMAND_201801_' + st + '1.csv')
df_test[st] = df_test[st].append(dataset.iloc[:,1:3])
df_test[st] = df_test[st].set_index('SETTLEMENTDATE')
plt.plot(df['NSW'].iloc[:,0].values)
plt.show()
plt.plot(df['QLD'].iloc[:,0].values)
plt.show()
plt.plot(df['SA'].iloc[:,0].values)
plt.show()
plt.plot(df['TAS'].iloc[:,0].values)
plt.show()
plt.plot(df['VIC'].iloc[:,0].values)
plt.show()
数据集已上传here。有什么建议吗?