多变量线性回归码中的问题

时间:2019-06-05 11:52:20

标签: python machine-learning regression linear-regression

我从头开始编写了一个多变量线性回归代码,但是当我尝试运行该代码时,theta的最终值不正确(某些条目的顺序为10^20)。有人可以帮我吗?

这是波士顿房屋预测数据集。我试图预测房屋价格。我根据Andrew Ng教授在其机器学习课程中提供的算法编写了线性回归代码。我试图在python中实现该算法。但是我的theta值仍然不正确。

这里是data link

这是我的代码:

import pandas as pd
import numpy as np

X_train = pd.read_csv("train.csv")
X_test = pd.read_csv("test.csv")
X_train.head()

X_train.shape

y_train = X_train['medv']
X_train = X_train.drop(columns = ['medv'], axis = 1)

theta = np.zeros(14)

alpha = 0.01
m = len(theta)

X_train.head()
X_train = X_train.drop(columns = ['ID'], axis = 1)
X_test = X_test.drop(columns = ['ID'], axis = 1)

X_train = np.column_stack((np.ones(len(X_train)),X_train))

X_train.shape

for j in range(1000):
    for i in range(m):
        h = np.dot(X_train, theta)
        d_J = np.dot((h - y_train), X_train[:, i])
        theta[i] = theta[i] - (alpha)*(1/m)*d_J

Theta值:

array([[ 5.41571429e+00],
       [ 7.35302513e+00],
       [ 5.96202743e+01],
       [-6.13110873e+02],
       [ 1.00881890e+02],
       [ 9.36757919e+02],
       [ 8.19165542e+03],
       [-7.07535737e+05],
       [ 3.54080584e+07],
       [-1.02568786e+08],
       [ 1.22841775e+11],
       [-2.25615368e+14],
       [ 3.50107077e+17],
       [-3.56510417e+20]])

1 个答案:

答案 0 :(得分:0)

如果我们可以链接到您的数据集