Question

所以我试图做多变量梯度下降。当我不做多变量时，我能够正常工作......但是当我使用多个时，我会遇到奇怪的错误。我从csv文件中获取它，它有6或7列，但我没有全部使用它们，这就是我的np.delete来自的地方。

import numpy as np
import pandas as pd
import csv
alpha=.01
iterations=1000
with open('sample_submission.csv','r') as csv_file:
    csv_reader=list(csv.reader(csv_file,delimiter=','))
    csv_reader=np.array(csv_reader[1:],dtype=np.float64)
    data=np.delete(csv_reader,[0,2,5],axis=1)

X=(data[:,0:3])
y=(data[:,3])
X=np.matrix(X)
y=np.matrix(y)
theta=np.matrix(np.array([0,0,0]))

def computeCost(X,y,theta):
    z=np.power(((X*theta.T)-y),2)
    xxx=np.sum(z)/(2*len(X))
    print(xxx)
    return xxx

def gradientDescent(X,y,theta,alpha,iterations):
    temp=np.matrix(np.zeros(theta.shape))
    parameters=int(theta.ravel().shape[1])
    cost=np.zeros(iterations)

    for i in range(iterations):
        error=(X*theta.T)-y

        for j in range(parameters):
            term=np.multiply(error,X[:,j])
            temp[0,j]=theta[0,j]-((alpha/len(X))*np.sum(term))

        theta=temp
        cost[i]=computeCost(X,y,theta)
    return theta,cost

g,cost=gradientDescent(X,y,theta,alpha,iterations)
computeCost(X,y,g)

试图了解这些错误以及从何处开始？ inf和nan的重复次数不止于此，但是我从它们中删除了几行而不是帖子。任何帮助或正确方向的一点将不胜感激。的输出

[gato@archlinux test1]$ python t.py
6.195789561917885e+31
2.030393130101553e+50
6.653766765521488e+68
2.1804945815191573e+87
7.14566167944397e+105
2.341692626518703e+124
7.673921049000399e+142
2.5148076053789887e+161
8.241233199676728e+179
2.700720504669319e+198
8.85048519756413e+216
2.9003774399044243e+235
9.504777541712369e+253
3.1147944703506266e+272
1.0207439942649033e+291
/usr/lib/python3.6/site-packages/numpy/core/_methods.py:32: RuntimeWarning: overflow encountered 
in reduce
   return umr_sum(a, axis, dtype, out, keepdims)
inf
t.py:23: RuntimeWarning: overflow encountered in power
  z=np.power(((X*theta.T)-y),2)
inf
inf
inf
inf
t.py:38: RuntimeWarning: invalid value encountered in double_scalars
  temp[0,j]=theta[0,j]-((alpha/len(X))*np.sum(term))
nan
nan
nan
nan

具有Python Inf错误的多元线性回归

0 个答案: