我尝试实现梯度下降,当我在样本数据集上对其进行测试时,它可以正常工作,但不适用于波士顿数据集。
您可以验证代码有什么问题吗?为什么我没有获得正确的theta向量?
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
X = load_boston().data
y = load_boston().target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
X_train1 = np.c_[np.ones((len(X_train), 1)), X_train]
X_test1 = np.c_[np.ones((len(X_test), 1)), X_test]
eta = 0.0001
n_iterations = 100
m = len(X_train1)
tol = 0.00001
theta = np.random.randn(14, 1)
for i in range(n_iterations):
gradients = 2/m * X_train1.T.dot(X_train1.dot(theta) - y_train)
if np.linalg.norm(X_train1) < tol:
break
theta = theta - (eta * gradients)
我的体重矢量呈(14,354)形状。我在这里做什么错了?
答案 0 :(得分:1)
考虑这一点(展开一些语句以提高可见性):
for i in range(n_iterations):
y_hat = X_train1.dot(theta)
error = y_hat - y_train[:, None]
gradients = 2/m * X_train1.T.dot(error)
if np.linalg.norm(X_train1) < tol:
break
theta = theta - (eta * gradients)
因为y_hat是(n_samples,1),而y_train是(n_samples,)-例如,n_samples是354-您需要使用虚拟轴技巧y_train[:, None]
将y_train带到相同的尺寸。
答案 1 :(得分:1)
for i in range(n_iterations):
gradients = 2/m * X_train1.T.dot(X_train1.dot(theta) - y_train.reshape(-1,1))
if np.linalg.norm(X_train1) < tol:
break
theta = theta - (eta * gradients)