LinearRegression预测以列表形式提供多个功能

时间:2019-02-28 15:46:49

标签: python machine-learning scikit-learn linear-regression

我正在使用简单的LinearRegression()来预测第二天的收盘价。我知道这是不可靠的,但至少我正在尝试理解和培训LR。

我只是提供最新的open, high, low, close值作为特征。我真正想做的是提供最近10天的open, high, low, close价格。在下面您可以找到我到目前为止所做的事情:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, median_absolute_error
pd.options.mode.chained_assignment = None

sym = "EURUSD"
period = "1d"
fl = "./feed/{} {}.csv".format(period, sym)
dg = 0.0001 # pip digits

df = pd.read_csv(fl)
df['Date'] = pd.to_datetime(df['Date'])
df['NextClose'] = df['Close'].shift(-1)
cols = ['Open', 'High', 'Low', 'Close'] # features
prd_cols = ['NextClose'] # prediction
real_prd = df[cols].iloc[-1:] # predict this after training
hour = df['Date'].iloc[-1]
df.dropna(inplace=True)

test_p = 20 # percent of test size
total = len(df) # total in DataFrame
test_size = int(total * test_p / 100)
X = df[cols]
y = df[prd_cols]
X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=True)

lr = LinearRegression()
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
rmse = int(np.sqrt(mse) / dg)
mae = int(mean_absolute_error(y_test, y_pred) / dg)
print("")
print("Pair:", sym, "Time:", period, hour)
print("Mean Sq. Err: {:.10f}".format(mse))
print("Root Mean Sq. Err:", rmse, "pips")
print("Mean Abs. Err:", mae, "pips")
print("Score: {:.2f}%".format(np.round(lr.score(X_test, y_test) * 100, 2)))
# predict last value
real_pred = lr.predict(real_prd)
print("Prediction:", np.round(real_pred[0][0], 5))

plt.scatter(y_pred, y_test)
plt.show()

编辑 输出:

Pair: EURUSD Time: 1d 2019-02-28 00:00:00
Mean Sq. Err: 0.0000594541
Root Mean Sq. Err: 77 pips
Mean Abs. Err: 56 pips
Score: 99.78%
Prediction: 1.14154

样品来源:

         Open     High      Low    Close
4921  1.14087  1.14092  1.13610  1.13658
4922  1.13658  1.13678  1.13245  1.13383
4923  1.13385  1.13509  1.13213  1.13222
4924  1.13199  1.13251  1.13189  1.13241
4925  1.13243  1.13303  1.12675  1.12787
4926  1.12785  1.13397  1.12580  1.13340
4927  1.13336  1.13417  1.12495  1.12648
4928  1.12650  1.13099  1.12501  1.12950
4929  1.12950  1.13064  1.12343  1.12922
4930  1.12916  1.12960  1.12898  1.12952
4931  1.12950  1.13341  1.12940  1.13118
4932  1.13116  1.13576  1.12757  1.13401
4933  1.13399  1.13714  1.13251  1.13464
4934  1.13465  1.13665  1.13208  1.13400
4935  1.13399  1.13558  1.13163  1.13331
4936  1.13362  1.13424  1.13292  1.13424
4937  1.13420  1.13677  1.13369  1.13651
4938  1.13651  1.14028  1.13454  1.13931
4939  1.13936  1.14037  1.13624  1.13799
4940  1.13793  1.14198  1.13675  1.14116

情节: enter image description here

我怎样才能将最后10行作为一项功能部署并仍然获得一个结果?

1 个答案:

答案 0 :(得分:0)

我已经编码了以下解决方案

cols = []
for i in range(1, 11):
    cl = "Open_{}".format(i)
    cols.append(cl)
    df[cl] = df['Open'].shift(i)
    cl = "High_{}".format(i)
    cols.append(cl)
    df[cl] = df['High'].shift(i)
    cl = "Low_{}".format(i)
    cols.append(cl)
    df[cl] = df['Low'].shift(i)
    cl = "Close_{}".format(i)
    cols.append(cl)
    df[cl] = df['Close'].shift(i)