如何使用PolynomialFeatures选择最合适的度数参数?

时间:2017-08-25 17:41:16

标签: python machine-learning scikit-learn

我有时间序列代码,可生成线性和二次趋势。我对选择degree参数的内容感到困惑。我看到以下定义:

Within scikit-learn's PolynomialFeatures, when the argument degree is passed, all terms up to that degree are created.

我只是不理解这个定义。有没有使用简单数学的解释?我怎样才能确保我使用最佳学位?

如果你想要一个样本,这是我的代码。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm                                                                                                                          

import statsmodels.formula.api as smf                                                                                                                 

import statsmodels.tsa.api as smt
import random
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline


y = [5*np.random.normal() for j in range(50)] + [30 + 5 * np.random.normal() for j in range(50)] +  [50 + 5 * np.random.normal() for j in range(50)] +  [20 + 5 * np.random.normal() for j in range(50)]
X = [x for x in range(len(y))]
X = np.reshape(X, (len(X), 1))

model = LinearRegression()
model.fit(X, y)
trend = model.predict(X)

model = make_pipeline(PolynomialFeatures(2), Ridge())
model.fit(X, y)
quadratic = model.predict(X)

fig = plt.figure(1, figsize=(15, 9))
ax = fig.add_subplot(111)
ax.plot(trend, label="Linear Trend")
ax.plot(quadratic, label="Quadratic Trend")
ax.plot(X, y, label='Time Series')
ax.legend()
plt.show()

1 个答案:

答案 0 :(得分:0)

你使用 2 作为学位;线性分量将包含在二次方中。例如,如果计算出的线性分量为GDI+且二次方为2x - 5,则从函数返回的内容将是总和3x^2 + x + 1