我正在尝试使用VARMA(或VAR)模型制作预测能源生产的模型。 (如果您有除VARMA或VAR之外的任何建议,请告知我们。)
我可用于培训的数据如下:
ID Label House Year Month Temperature Daylight EnergyProduction
0 0 1 2011 7 26.2 178.9 740
1 1 1 2011 8 25.8 169.7 731
2 2 1 2011 9 22.8 170.2 694
3 3 1 2011 10 16.4 169.1 688
4 4 1 2011 11 11.4 169.1 650
5 5 1 2011 12 4.2 199.5 763
...............
11995 19 500 2013 2 4.2 201.8 638
11996 20 500 2013 3 11.2 234 778
11997 21 500 2013 4 13.6 237.1 758
11998 22 500 2013 5 19.2 258.4 838
11999 23 500 2013 6 22.7 122.9 586
如上所示,我可以使用2011年7月至2013年5月的数据进行培训。 通过培训,我想预测2013年6月每500所房屋的能源产量。
通过使用statsmodels.api.tsa.filters.hpfilter
,我成功摆脱了" Daylight"中的趋势组件。
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
df = pd.read_csv('../../data/training_dataset_500.csv')
x = df.loc[df.House==1, 'Daylight']
x_smoothed, x_trend = sm.tsa.filters.hpfilter(x, lamb=129600)
fig, axes = plt.subplots(figsize=(12,4), ncols=3)
axes[0].plot(x)
axes[0].set_title('raw x')
axes[1].plot(x_trend)
axes[1].set_title('trend')
axes[2].plot(x_smoothed)
axes[2].set_title('smoothed x')
plt.show()
我在这里有两个问题(堆叠)。
Stack1
我无法删除季节性组件。
虽然我可以摆脱趋势组件,但我无法删除季节性组件以使时间序列数据保持不变。 (我需要使时间序列数据保持静止,以便获得良好的预测。)
stack2中
我想实现VARMA(或VAR)模型来获得预测。 但编写模型对我来说很难,也无法编写正确的代码......
同样,如果您有除VARMA或VAR之外的任何建议,请告知我们。
以下是我的代码。(我试图实现VAR模型)
import csv
import numpy as np
import statsmodels.tsa.stattools as st
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.tsa.vector_ar as var
import statsmodels.api as sm
data_train = pd.read_csv('../../data/training_dataset_500.csv')
data_test = pd.read_csv('../../data/test_dataset_500.csv')
data_set = pd.read_csv('../../data/dataset_500.csv')
rng=pd.date_range('7/1/2011', '6/1/2013', freq='M')
def mape_of_var(house, predfile, mapefile):
sum0 = 0
f_pred = open(predfile, 'w')
f_mape = open(mapefile, 'w')
pred_writer = csv.writer(f_pred)
pred_writer.writerow(['House','EnergyProduction'])
for i in house:
data = data_train[data_train.House==i][['EnergyProduction','Daylight','Temperature']].set_index(rng)
data_diff = data.diff().dropna()
model = var.var_model.VAR(data_diff)
results = model.fit(4, trend='nc')
lag_order = results.k_ar
data_pred = results.forecast(data_diff.values[-lag_order:],1)[0,0]+data.EnergyProduction[-1]
sum0 = sum0 + abs(data_pred-data_test.EnergyProduction[i-1])/data_test.EnergyProduction[i-1]
pred_writer.writerow([i,data_pred])
mape = round(sum0/len(house),3)
f_mape.write(str(mape))
f_pred.close()
f_mape.close()
#return mape
print 'MAPE(500 houses): ' + str(mape_of_var(range(1,501), 'pred_all.csv','mape_all.txt'))
当我实现此代码时,出现以下错误;
(DataVizProj)Soma-Suzuki:Soma Suzuki$ python examine.py
Traceback (most recent call last):
File "examine.py", line 41, in <module>
print 'MAPE(500 houses): ' + str(mape_of_var(range(1,501), 'pred_all.csv','mape_all.txt'))
File "examine.py", line 24, in mape_of_var
data = data_smoothed[data_smoothed.House==i][['EnergyProduction','Daylight','Temperature']].set_index(rng)
File "/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/core/frame.py", line 2625, in set_index
frame.index = index
File "/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/core/generic.py", line 2161, in __setattr__
return object.__setattr__(self, name, value)
File "pandas/src/properties.pyx", line 65, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:42548)
File "/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/core/generic.py", line 413, in _set_axis
self._data.set_axis(axis, labels)
File "/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/core/internals.py", line 2219, in set_axis
'new values have %d elements' % (old_len, new_len))
ValueError: Length mismatch: Expected axis has 0 elements, new values have 23 elements
任何信息都是helphul。