Question

实施如下的线性回归：

from sklearn.linear_model import LinearRegression

x = [1,2,3,4,5,6,7]
y = [1,2,1,3,2.5,2,5]

# Create linear regression object
regr = LinearRegression()

# Train the model using the training sets
regr.fit([x], [y])

# print(x)
regr.predict([[1, 2000, 3, 4, 5, 26, 7]])

产生：

array([[1. , 2. , 1. , 3. , 2.5, 2. , 5. ]])

在利用预测函数时，为什么不能利用单个x值进行预测？

尝试regr.predict([[2000]])

返回：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-3a8b477f5103> in <module>()
     11 
     12 # print(x)
---> 13 regr.predict([[2000]])

/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/base.py in predict(self, X)
    254             Returns predicted values.
    255         """
--> 256         return self._decision_function(X)
    257 
    258     _preprocess_data = staticmethod(_preprocess_data)

/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/base.py in _decision_function(self, X)
    239         X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
    240         return safe_sparse_dot(X, self.coef_.T,
--> 241                                dense_output=True) + self.intercept_
    242 
    243     def predict(self, X):

/usr/local/lib/python3.6/dist-packages/sklearn/utils/extmath.py in safe_sparse_dot(a, b, dense_output)
    138         return ret
    139     else:
--> 140         return np.dot(a, b)
    141 
    142 

ValueError: shapes (1,1) and (7,7) not aligned: 1 (dim 1) != 7 (dim 0)

Answer 1

执行此操作时：

regr.fit([x], [y])

你基本上输入了这个：

regr.fit([[1,2,3,4,5,6,7]], [[1,2,1,3,2.5,2,5]])

(1,7)的形状为X，(1,7)的形状为y。

现在查看documentation of fit()：

参数：

X : numpy array or sparse matrix of shape [n_samples,n_features]
    Training data

y : numpy array of shape [n_samples, n_targets]
    Target values. Will be cast to X’s dtype if necessary

所以在这里，模型假设您拥有数据的数据具有7个功能并且您有7个目标。请参阅this for more information on multi-output regression。

因此，在预测时，模型将需要具有7个特征的数据，形状为(n_samples_to_predict, 7)，并将输出形状为(n_samples_to_predict, 7)的数据。

如果相反，你想要这样的东西：

然后，对于目标(7,1)，您需要x形状为输入(7,)和(7,1)或y。

正如@WStokvis在评论中所说，你需要这样做：

import numpy as np
X = np.array(x).reshape(-1, 1)
y = np.array(y)          # You may omit this step if you want

regr.fit(X, y)           # Dont wrap it in []

然后再次在预测时间：

X_new = np.array([1, 2000, 3, 4, 5, 26, 7]).reshape(-1, 1)
regr.predict(X_new)

然后执行以下操作不会引发错误：

regr.predict([[2000]])

因为存在所需的形状。

评论更新： -

当您执行[[2000]]时，它将在内部转换为np.array([[2000]])，因此其形状为(1,1)。这类似于(n_samples, n_features)，其中n_features = 1。这对于模型是正确的，因为在训练时，数据具有形状(n_samples, 1)。所以这很有效。

现在让我们说，你有：

X_new = [1, 2000, 3, 4, 5, 26, 7] #(You havent wrapped it in numpy array and reshape(-1,1) yet

同样，它将在内部转换为：

X_new = np.array([1, 2000, 3, 4, 5, 26, 7])

所以现在X_new的形状为(7,)。看它只是一维数组。它是行向量还是列向量无关紧要。它只是(n,)的一维数组。

因此，scikit可能无法推断其n_samples=n和n_features=1或其他方式（n_samples=1和n_features=n）。请参阅my other answer which explains about this。

因此我们需要通过reshape(-1,1)将一维数组显式转换为2-d。希望现在明白。

单线预测与线性回归

1 个答案: