无法制定能耗预测模型

时间:2015-07-22 06:17:50

标签: python pandas matplotlib machine-learning statsmodels

我正在尝试使用VARMA(或VAR)模型制作预测能源生产的模型。 (如果您有除VARMA或VAR之外的任何建议,请告知我们。

我可用于培训的数据如下:

https://github.com/soma11soma11/EnergyDataSimulationChallenge/blob/master/challenge1/data/training_dataset_500.csv

ID  Label   House   Year    Month   Temperature Daylight    EnergyProduction
0     0       1     2011     7         26.2      178.9         740
1     1       1     2011     8         25.8      169.7         731
2     2       1     2011     9         22.8      170.2         694
3     3       1     2011     10        16.4      169.1         688
4     4       1     2011     11        11.4      169.1         650
5     5       1     2011     12         4.2      199.5         763
...............

11995 19     500    2013     2          4.2      201.8         638
11996 20     500    2013     3         11.2        234         778
11997 21     500    2013     4         13.6      237.1         758
11998 22     500    2013     5         19.2      258.4         838
11999 23     500    2013     6         22.7      122.9         586

如上所示,我可以使用2011年7月至2013年5月的数据进行培训。 通过培训,我想预测2013年6月每500所房屋的能源产量。

通过使用statsmodels.api.tsa.filters.hpfilter,我成功摆脱了" Daylight"中的趋势组件。

import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm


df = pd.read_csv('../../data/training_dataset_500.csv')

x = df.loc[df.House==1, 'Daylight']
x_smoothed, x_trend = sm.tsa.filters.hpfilter(x, lamb=129600)
fig, axes = plt.subplots(figsize=(12,4), ncols=3)
axes[0].plot(x)
axes[0].set_title('raw x')
axes[1].plot(x_trend)
axes[1].set_title('trend')
axes[2].plot(x_smoothed)
axes[2].set_title('smoothed x')
plt.show()

enter image description here

我在这里有两个问题(堆叠)。

Stack1

我无法删除季节性组件。

虽然我可以摆脱趋势组件,但我无法删除季节性组件以使时间序列数据保持不变。 (我需要使时间序列数据保持静止,以便获得良好的预测。)

stack2中

我想实现VARMA(或VAR)模型来获得预测。 但编写模型对我来说很难,也无法编写正确的代码......

同样,如果您有除VARMA或VAR之外的任何建议,请告知我们。

以下是我的代码。(我试图实现VAR模型)

import csv
import numpy as np
import statsmodels.tsa.stattools as st
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.tsa.vector_ar as var
import statsmodels.api as sm

data_train = pd.read_csv('../../data/training_dataset_500.csv')
data_test = pd.read_csv('../../data/test_dataset_500.csv')
data_set = pd.read_csv('../../data/dataset_500.csv')

rng=pd.date_range('7/1/2011', '6/1/2013', freq='M')


def mape_of_var(house, predfile, mapefile):
    sum0 = 0
    f_pred = open(predfile, 'w')
    f_mape = open(mapefile, 'w')
    pred_writer = csv.writer(f_pred)
    pred_writer.writerow(['House','EnergyProduction'])
    for i in house:
        data = data_train[data_train.House==i][['EnergyProduction','Daylight','Temperature']].set_index(rng)
        data_diff = data.diff().dropna()
        model = var.var_model.VAR(data_diff)
        results = model.fit(4, trend='nc')
        lag_order = results.k_ar
        data_pred = results.forecast(data_diff.values[-lag_order:],1)[0,0]+data.EnergyProduction[-1]
        sum0 = sum0 + abs(data_pred-data_test.EnergyProduction[i-1])/data_test.EnergyProduction[i-1]
        pred_writer.writerow([i,data_pred])

    mape = round(sum0/len(house),3)
    f_mape.write(str(mape))
    f_pred.close()
    f_mape.close()

#return mape


print 'MAPE(500 houses): ' + str(mape_of_var(range(1,501), 'pred_all.csv','mape_all.txt'))

当我实现此代码时,出现以下错误;

  (DataVizProj)Soma-Suzuki:Soma Suzuki$ python examine.py
Traceback (most recent call last):
  File "examine.py", line 41, in <module>
    print 'MAPE(500 houses): ' + str(mape_of_var(range(1,501), 'pred_all.csv','mape_all.txt'))
  File "examine.py", line 24, in mape_of_var
    data = data_smoothed[data_smoothed.House==i][['EnergyProduction','Daylight','Temperature']].set_index(rng)
  File "/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/core/frame.py", line 2625, in set_index
    frame.index = index
  File "/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/core/generic.py", line 2161, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/src/properties.pyx", line 65, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:42548)
  File "/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/core/generic.py", line 413, in _set_axis
    self._data.set_axis(axis, labels)
  File "/Users/Suzuki/Envs/DataVizProj/lib/python2.7/site-packages/pandas/core/internals.py", line 2219, in set_axis
    'new values have %d elements' % (old_len, new_len))
ValueError: Length mismatch: Expected axis has 0 elements, new values have 23 elements

任何信息都是helphul。

0 个答案:

没有答案