Sklearn线性回归预测不正确的值

时间:2016-12-31 04:49:24

标签: python python-3.x scikit-learn

我正在尝试使用SKlearn创建一个模型来预测股市价格。我知道股市预测非常困难,但我认为我得到的结果是非常错误的。基本上我认为正在发生的是,即使我将标签设置为明天的收盘价,它也会预测今天的收盘价。这是我的代码,它应该是一个骨架,我会稍后再进行改进。

from sklearn.svm import SVR
from sklearn import linear_model
from bokeh.plotting import figure, output_file, show, vplot
from bokeh.layouts import column
import shelve
import math
import numpy as np
import random
import warnings
warnings.filterwarnings("ignore")

d = shelve.open("shelve.slv")
def get_data(ticker):
    stock = d[ticker]
    return stock

def shuffle_two(arr1, arr2):
        list1_shuf = []
        list2_shuf = []
        index_shuf = list(range(len(arr1)))
        random.shuffle(index_shuf)
        for i in index_shuf:
            list1_shuf.append(arr1[i])
            list2_shuf.append(arr2[i])

        return list1_shuf, list2_shuf

def setup_data(stock, split, backtest = True):
    stock['HL_PCT'] = ((stock['High'] - stock["Low"]) / stock['Low']) * 100

    stock = stock[['Close', 'Volume', 'HL_PCT']]

    stock['Labels'] = stock['Close'].shift(-1)

    x = np.array(stock.drop(["Labels"], 1))
    stock.dropna(inplace = True)
    y = np.array(stock['Labels'])

    future_x = x[-1]
    x = x[:-1]

    train_x = list(x[:-split])
    train_y = list(y[:-split])
    test_x = list(x[-split:])
    test_y = list(y[-split:])

    return train_x, train_y, test_x, test_y, future_x



stock = get_data('GOOGL')
train_x, train_y, test_x, test_y, future_x = setup_data(stock, 1)

model = linear_model.LinearRegression()
print (train_x[2])
print (train_y[2])
model.fit(train_x, train_y)
print (model.predict(future_x))
print (future_x)


#print (stock["Close"][-10:])

每当我尝试输出预测时,我也会打印出我预测的内容。这是数据:

[ 791.95367399]
[  7.92450012e+02   1.72830000e+06   1.73121034e+00]

数据设置为收盘价,成交量和高价与低价之间的百分比差异。但是,如果你看一下,该模型预测为791.95,而今天的收盘价为792.45,这是如此接近它是可疑的,特别是从股票价格高位移动每天超过50美分的事实来看。你看到我做错了吗?

0 个答案:

没有答案