我正在尝试使用SKlearn创建一个模型来预测股市价格。我知道股市预测非常困难,但我认为我得到的结果是非常错误的。基本上我认为正在发生的是,即使我将标签设置为明天的收盘价,它也会预测今天的收盘价。这是我的代码,它应该是一个骨架,我会稍后再进行改进。
from sklearn.svm import SVR
from sklearn import linear_model
from bokeh.plotting import figure, output_file, show, vplot
from bokeh.layouts import column
import shelve
import math
import numpy as np
import random
import warnings
warnings.filterwarnings("ignore")
d = shelve.open("shelve.slv")
def get_data(ticker):
stock = d[ticker]
return stock
def shuffle_two(arr1, arr2):
list1_shuf = []
list2_shuf = []
index_shuf = list(range(len(arr1)))
random.shuffle(index_shuf)
for i in index_shuf:
list1_shuf.append(arr1[i])
list2_shuf.append(arr2[i])
return list1_shuf, list2_shuf
def setup_data(stock, split, backtest = True):
stock['HL_PCT'] = ((stock['High'] - stock["Low"]) / stock['Low']) * 100
stock = stock[['Close', 'Volume', 'HL_PCT']]
stock['Labels'] = stock['Close'].shift(-1)
x = np.array(stock.drop(["Labels"], 1))
stock.dropna(inplace = True)
y = np.array(stock['Labels'])
future_x = x[-1]
x = x[:-1]
train_x = list(x[:-split])
train_y = list(y[:-split])
test_x = list(x[-split:])
test_y = list(y[-split:])
return train_x, train_y, test_x, test_y, future_x
stock = get_data('GOOGL')
train_x, train_y, test_x, test_y, future_x = setup_data(stock, 1)
model = linear_model.LinearRegression()
print (train_x[2])
print (train_y[2])
model.fit(train_x, train_y)
print (model.predict(future_x))
print (future_x)
#print (stock["Close"][-10:])
每当我尝试输出预测时,我也会打印出我预测的内容。这是数据:
[ 791.95367399]
[ 7.92450012e+02 1.72830000e+06 1.73121034e+00]
数据设置为收盘价,成交量和高价与低价之间的百分比差异。但是,如果你看一下,该模型预测为791.95,而今天的收盘价为792.45,这是如此接近它是可疑的,特别是从股票价格高位移动每天超过50美分的事实来看。你看到我做错了吗?