Python线性回归总是100%的准确性

时间:2018-12-13 18:31:34

标签: python machine-learning scikit-learn artificial-intelligence linear-regression

嗨,我的考试项目有问题。 我正在尝试使用名为Iextrading的网络API创建一个非常简单的Stock预测指标,该指标以json格式返回给我Telsa最近5年的股票,没有什么幻想。 然后,我希望能够预测明天(第二天)的库存。 但是,我必须承认,我在进行机器学习时感到非常迷失。 我想我已经成功创建了AI模型。但是它总是说100%的准确性,我知道这不是真的/可能。 老实说,我什至不知道在哪里寻找问题,我想它必须与测试/培训数据有关。 而且我想一旦完成,那么我就需要找出如何仅将模型的日期作为预测的输入。

这是我的代码,在此先感谢:

import matplotlib 
import matplotlib.pyplot as plt 
import numpy as np 
from sklearn import datasets, linear_model 
import sklearn.metrics as sm
import pandas as pd 

data = pd.read_json('https://api.iextrading.com/1.0/stock/tsla/chart/5y')
data.head()

data = data.iloc[:, :]

from sklearn import preprocessing
enc = preprocessing.LabelEncoder()
enc.fit(data['date'])
data['date'] = enc.transform(data['date'])

#Label is like a date expression ex. "Dec 13", "Nov 12"
from sklearn import preprocessing
enc2 = preprocessing.LabelEncoder()
enc2.fit(data['label'])
data['label'] = enc2.transform(data['label'])

X = data.iloc[:, :-1].values 
X = data.drop('close', axis=1)
y = data.iloc[:, 3] 

# Split in train and test
num_training = int(0.8 * len(X))
num_test = len(X) - num_training

# Training data
X_train, y_train = X[:num_training], y[:num_training]

# Test data
X_test, y_test = X[num_training:], y[num_training:]

# Create linear regressor object
regressor = linear_model.LinearRegression()

# Train the model using the training sets
regressor.fit(X_train, y_train)

# Predict the output
y_test_pred = regressor.predict(X_test)

# Compute performance metrics
print("Linear regressor performance:")
print("Mean absolute error =", round(sm.mean_absolute_error(y_test, y_test_pred), 2))
print("Mean squared error =", round(sm.mean_squared_error(y_test, y_test_pred), 2)) 
print("Median absolute error =", round(sm.median_absolute_error(y_test, y_test_pred), 2)) 
print("Explain variance score =", round(sm.explained_variance_score(y_test, y_test_pred), 2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))

# Perform prediction on train data, reuse
y_test_pred_new = regressor.predict(X_test)
print("\nNew mean absolute error =", round(sm.r2_score(y_test, y_test_pred_new), 2))

以下是数据示例

Data columns (total 12 columns):
change              1258 non-null float64
changeOverTime      1258 non-null float64
changePercent       1258 non-null float64
close               1258 non-null float64
date                1258 non-null datetime64[ns]
high                1258 non-null float64
label               1258 non-null object
low                 1258 non-null float64
open                1258 non-null float64
unadjustedVolume    1258 non-null int64
volume              1258 non-null int64
vwap                1258 non-null float64
dtypes: datetime64[ns](1), float64(8), int64(2), object(1)

#Example Values from data entry: 0
change : 0.184
changeOverTime: 0.000000
changePercent: 0.125
close: 147.654
date: 2013-12-13
high: 151.80
label: Dec 13, 13
low: 147.3200
open: 148.05
unadjustedVolume: 10599775
volume: 10599775
vwap: 149.5224

0 个答案:

没有答案