嗨,我的考试项目有问题。 我正在尝试使用名为Iextrading的网络API创建一个非常简单的Stock预测指标,该指标以json格式返回给我Telsa最近5年的股票,没有什么幻想。 然后,我希望能够预测明天(第二天)的库存。 但是,我必须承认,我在进行机器学习时感到非常迷失。 我想我已经成功创建了AI模型。但是它总是说100%的准确性,我知道这不是真的/可能。 老实说,我什至不知道在哪里寻找问题,我想它必须与测试/培训数据有关。 而且我想一旦完成,那么我就需要找出如何仅将模型的日期作为预测的输入。
这是我的代码,在此先感谢:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
import sklearn.metrics as sm
import pandas as pd
data = pd.read_json('https://api.iextrading.com/1.0/stock/tsla/chart/5y')
data.head()
data = data.iloc[:, :]
from sklearn import preprocessing
enc = preprocessing.LabelEncoder()
enc.fit(data['date'])
data['date'] = enc.transform(data['date'])
#Label is like a date expression ex. "Dec 13", "Nov 12"
from sklearn import preprocessing
enc2 = preprocessing.LabelEncoder()
enc2.fit(data['label'])
data['label'] = enc2.transform(data['label'])
X = data.iloc[:, :-1].values
X = data.drop('close', axis=1)
y = data.iloc[:, 3]
# Split in train and test
num_training = int(0.8 * len(X))
num_test = len(X) - num_training
# Training data
X_train, y_train = X[:num_training], y[:num_training]
# Test data
X_test, y_test = X[num_training:], y[num_training:]
# Create linear regressor object
regressor = linear_model.LinearRegression()
# Train the model using the training sets
regressor.fit(X_train, y_train)
# Predict the output
y_test_pred = regressor.predict(X_test)
# Compute performance metrics
print("Linear regressor performance:")
print("Mean absolute error =", round(sm.mean_absolute_error(y_test, y_test_pred), 2))
print("Mean squared error =", round(sm.mean_squared_error(y_test, y_test_pred), 2))
print("Median absolute error =", round(sm.median_absolute_error(y_test, y_test_pred), 2))
print("Explain variance score =", round(sm.explained_variance_score(y_test, y_test_pred), 2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))
# Perform prediction on train data, reuse
y_test_pred_new = regressor.predict(X_test)
print("\nNew mean absolute error =", round(sm.r2_score(y_test, y_test_pred_new), 2))
以下是数据示例
Data columns (total 12 columns):
change 1258 non-null float64
changeOverTime 1258 non-null float64
changePercent 1258 non-null float64
close 1258 non-null float64
date 1258 non-null datetime64[ns]
high 1258 non-null float64
label 1258 non-null object
low 1258 non-null float64
open 1258 non-null float64
unadjustedVolume 1258 non-null int64
volume 1258 non-null int64
vwap 1258 non-null float64
dtypes: datetime64[ns](1), float64(8), int64(2), object(1)
#Example Values from data entry: 0
change : 0.184
changeOverTime: 0.000000
changePercent: 0.125
close: 147.654
date: 2013-12-13
high: 151.80
label: Dec 13, 13
low: 147.3200
open: 148.05
unadjustedVolume: 10599775
volume: 10599775
vwap: 149.5224