Question

考虑简单的一个特征线性回归。 x =特征，w =权重我们有最适合线性回归模型的因子， w =（xTx）^（ - 1）xTy 现在我比较我从scikit学习回归量和计算w方法得到的结果，他们之间有显着差异。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('Salary_Data.csv')
x = data.iloc[:,[0]].values
y = data.iloc[:,[1]].values
#space
x_t = np.transpose(x)
first_inv = np.matmul(x_t, x)
second = np.matmul(x_t, y)
first = np.linalg.inv(first_inv)
theta = np.matmul(first, second)
y_prad = theta*x
#space
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x, y)
y_prad2 = regressor.predict(x)
#space
plt.scatter(x, y)
plt.plot(x, y_prad , 'red')
plt.plot(x, y_prad2, 'green')

我在哪里错了？（无论是概念还是代码）

Answer 1

你忘记了拦截术语。使用将一列1添加到x矩阵 geneID 然后重新运行计算。 x的形状应为（30,2），其中第一列全部为1，表示常数乘以截距。 theta的最终形状应为（2,1），其中第一项是截距，第二项是斜率。

这是线性回归矩阵公式的一个很好的参考。 Matrix Formulation of Linear Regression

比较线性回归的计算和分析结果

1 个答案: