使用多项式特征

时间:2018-08-09 21:17:52

标签: python pandas numpy scikit-learn

问题

首先,我是机器学习的新手。我决定测试一下我在一些财务数据上学到的东西,而我的机器学习模型如下所示:

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

df = pd.read_csv("/Users/Documents/Trading.csv")
poly_features = PolynomialFeatures(degree=2, include_bias=False)
linear_reg = LinearRegression(fit_intercept = True)

X = df_copy[["open","volume", "base volume", "RSI_14"]]
X_poly = poly_features.fit_transform(X)[1]


y = df_copy[["high"]]

linear_reg.fit(X_poly, y)

x = linear_reg.predict([[1.905E-05, 18637.07503453,0.35522205,  69.95820948552947]])
print(x)

一切正常,直到我尝试实现PolynomialFeatures的那一刻,这会带来以下错误:

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

尝试解决此问题:

豁免1

我尝试将.values添加到X,但仍然出现相同的错误:

X_poly = poly_features.fit_transform(X.values)[1]

豁免2

我尝试通过在reshape(-1, 1)的末尾添加X_poly来解决此问题:

 X_poly = poly_features.fit_transform(X)[1].reshape(-1, 1)

但是它只是用这个错误代替了先前的错误:

ValueError: Found input variables with inconsistent numbers of samples: [14, 5696]

非常感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

它希望您转换输入。尝试使用X_poly = poly_features.fit_transform(X.values.reshape(1,-1))[1]