准确预测错误的价值

时间:2016-07-30 22:51:37

标签: python machine-learning scikit-learn regression

我是一个蟒蛇爱好者和ML菜鸟,我想我会试着看看我是否可以做一点股市预测。我绝不是在这个领域受过教育而且不是数学boff。 然而,我已经设法得到一个非常准确的错误值的预测,并希望一个伟大的ML大师将指向我正确的方向。

首先在我的主数据框(数据)中,我计算布林带,三个DEMA(12,26,100)和MACD,然后我将我的股票数据的关闭列转换为-15英寸的"预测"列,应该是我想要预测的值,未来15个。

然后我将最后200行分成一个单独的数据帧(data1),我将用它来测试我的模型。

在这里'它是怎么回事;

# Create an empty dataframe with only the index of the original
test_data = pd.DataFrame(index=data.index)
# converting in to percentage of Close
test_data['b_upper'] = data['b_upper'] / data['Close']
test_data['b_middle'] = data['b_middle'] / data['Close']
test_data['b_lower'] = data['b_lower'] / data['Close']
test_data['dema1'] = data['dema1'] / data['Close']
test_data['dema2'] = data['dema2'] / data['Close']
test_data['dema3'] = data['dema3'] / data['Close']
# keep MACD
test_data['macd'] = data['macd']
test_data['macdsignal'] = data['macdsignal']
test_data['macdhist'] = data['macdhist']
# convert predict to a percentage of Close
test_data['predict'] = data['predict'] / data['Close']

split = int( len(test_data) * 0.8 )
train, test = test_data[:split], test_data[split:]
Xcols = ['b_upper', 'b_middle', 'b_lower', 'dema1', 'dema2', 'dema3', 'macd', 'macdsignal', 'macdhist']
ycols = ['predict']

train_X = train[Xcols].values
train_y = train[ycols].values.ravel()
test_X = test[Xcols].values
test_y = test[ycols].values.ravel()

data_1 = pd.DataFrame(index=data1.index)

# converting in to percentage of Change
data_1['b_upper'] = data1['b_upper'] / data1['Close']
data_1['b_middle'] = data1['b_middle'] / data1['Close']
data_1['b_lower'] = data1['b_lower'] / data1['Close']
data_1['dema1'] = data1['dema1'] / data1['Close']
data_1['dema2'] = data1['dema2'] / data1['Close']
data_1['dema3'] = data1['dema3'] / data1['Close']
# keep MACD
data_1['macd'] = data1['macd']
data_1['macdsignal'] = data1['macdsignal']
data_1['macdhist'] = data1['macdhist']
# convert predict to a percentage of Close
data_1['predict'] = data1['predict'] / data1['Close']
data_X = data_1[Xcols].values

reg = AdaBoostRegressor( ExtraTreeRegressor(), n_estimators=100, random_state=np.random.RandomState(1) )
reg.fit(train_X, train_y)
p = reg.predict(test_X)

print 'AdaBoostRegressor'
print 'Score:', reg.score(test_X, test_y)
print 'Mean Sq Err:', mean_squared_error(p, test_y)
print 'Obj:', get_size(reg), 'bytes'

返回:

AdaBoostRegressor
Score: -0.064949432102
Mean Sq Err: 0.000173545187532
Obj: 131550 bytes

到目前为止一切都那么好,但是当我用它预测时,它会非常准确地得到实际的近似值,而不是未来预测的近15个滴答:

p = reg.predict(data_X)
data1['predicted'] = data1['Close'] * p
data1[['Close','predict', 'predicted']].head(6)

返回:

                    Close   predict predicted
datetime            
2016-05-28 18:15:00 501.00  532.98  501.890391
2016-05-28 18:30:00 501.57  537.34  499.006097
2016-05-28 18:45:00 500.28  532.00  499.654848
2016-05-28 19:00:00 500.28  535.39  499.897474
2016-05-28 19:15:00 501.97  535.23  499.415860
2016-05-28 19:30:00 501.12  531.74  502.4060

绘制出来的图表显示了它如何预测(红色)实际的关闭(浅蓝色),我期待它预测"预测"价值(绿色): plot 我几乎可以通过谷歌的每一种方式尝试这一点,但最终仍然预测现在而不是未来。我非常确定我错过了一些基本的东西,非常感谢你的建议,让这个菜鸟更好地了解预测未来需要做些什么。

谢谢, Ĵ

0 个答案:

没有答案