Question

我在纯python中执行多元线性回归，如下面的代码所示。有人可以告诉我他的代码中有什么问题吗？我对单变量线性回归做了同样的事情。它表现很好！

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

x_df=pd.DataFrame([[2.0,70.0],[3.0,30.0],[4.0,80.0],[4.0,20.0],[3.0,50.0],[7.0,10.0],[5.0,50,0],[3.0,90.0],[2.0,20.0]])
y_df=pd.DataFrame([79.4,41.5,97.5,36.1,63.2,39.5,69.8,103.5,29.5])
x_df=x_df.drop(x_df.columns[2:], axis=1)

#print(x_df)

m=len(y_df)
#print(m)

x_df['intercept']=1
X=np.array(x_df)
#print(X)
#print(X.shape)
y=np.array(y_df).flatten()
#print(y.shape)
theta=np.array([0,0,0])
#print(theta)

def hypothesis(x,theta):
    return np.dot(x,theta)

#print(hypothesis(X,theta))

def cost(x,y,theta):
    m=y.shape[0]
    h=np.dot(x,theta)
    return np.sum(np.square(y-h))/(2.0*m)

#print(cost(X,y,theta))

def gradientDescent(x,y,theta,alpha=0.01,iter=1500):
    m=y.shape[0]

    for i in range(1500):
        h=hypothesis(x,theta)
        error=h-y
        update=np.dot(error,x)
        theta=np.subtract(theta,((alpha*update)/m))


    print('theta',theta)
    print('hyp',h)
    print('y',y)
    print('error',error)
    print('cost',cost(x,y,theta))

print(gradientDescent(X,y,theta))

我得到的输出是： -

theta [ nan  nan  nan]
hyp [ nan  nan  nan  nan  nan  nan  nan  nan  nan]
y [  79.4   41.5   97.5   36.1   63.2   39.5   69.8  103.5   29.5]
error [ nan  nan  nan  nan  nan  nan  nan  nan  nan]
cost nan

有人可以帮我解决这个问题吗？我已经被打了近5个小时的尝试！

Answer 1

您的学习率太大而无法收敛，请尝试alpha = 0.00001。

当执行多元线性回归时，函数返回nan for for循环

1 个答案: