Question

我想更深入地了解scikit learn的PolynomialFeatures类中.fit_transform（）方法输出的内容。

据我所知，该方法有两个方面，1）通过将数据拟合到回归算法来生成数据模型，2）根据1中找到的模型创建新数据。

但我不理解的是输出。这是我的代码：

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split


np.random.seed(0)
n = 15
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x)+x/6 + np.random.randn(n)/10


X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=0)
X_train1 = X_train.reshape(11,1)
y_train1 = y_train.reshape(11,1)

def answer_one():
    from sklearn.linear_model import LinearRegression
    from sklearn.preprocessing import PolynomialFeatures

    poly1 = PolynomialFeatures(degree=1)

    X_poly1 = poly1.fit_transform(X_train1)

    return X_poly1

answer_one()

我得到的输出是：

array([[  1.        ,  10.08877265],
       [  1.        ,   3.23065446],
       [  1.        ,   1.62431903],
       [  1.        ,   9.31004929],
       [  1.        ,   7.17166586],
       [  1.        ,   4.96972856],
       [  1.        ,   8.14799756],
       [  1.        ,   2.59103578],
       [  1.        ,   0.35281047],
       [  1.        ,   3.375973  ],
       [  1.        ,   8.72363612]])

我假设每个迷你数组中的每个秒数都是由模型计算的值，但我不明白每个数字是什么？

Answer 1

来自PolynomialFeatures文档：

生成由所有多项式组成的新特征矩阵度数小于或等于的特征组合指定学位。例如，如果输入样本是二维的形式[a，b]，2次多项式特征是[1，a，b， a ^ 2，ab，b ^ 2]。

在您的情况下，输出是x列的度数小于或等于1的所有组合：[1, x]。在第一列中，您有x**0，在第二列x**1

Answer 2

您稍微误解了PolynomialFeatures发生的事情。这个想法根本不适合模型，而只是通过将现有特征相乘来创建新特征。文档中的示例非常有助于解释，如果输入样本是二维的并且格式为[a, b]，则2次多项式要素为[1, a, b, a^2, ab, b^2]。

因此，您在示例中看到的只是偏见和您的输入。如果设置`include_bias = False＆＃39;在你的模型中，那些将会去。

.fit_transform方法的输出

2 个答案: