所以我试图做多变量梯度下降。当我不做多变量时,我能够正常工作......但是当我使用多个时,我会遇到奇怪的错误。我从csv文件中获取它,它有6或7列,但我没有全部使用它们,这就是我的np.delete来自的地方。
import numpy as np
import pandas as pd
import csv
alpha=.01
iterations=1000
with open('sample_submission.csv','r') as csv_file:
csv_reader=list(csv.reader(csv_file,delimiter=','))
csv_reader=np.array(csv_reader[1:],dtype=np.float64)
data=np.delete(csv_reader,[0,2,5],axis=1)
X=(data[:,0:3])
y=(data[:,3])
X=np.matrix(X)
y=np.matrix(y)
theta=np.matrix(np.array([0,0,0]))
def computeCost(X,y,theta):
z=np.power(((X*theta.T)-y),2)
xxx=np.sum(z)/(2*len(X))
print(xxx)
return xxx
def gradientDescent(X,y,theta,alpha,iterations):
temp=np.matrix(np.zeros(theta.shape))
parameters=int(theta.ravel().shape[1])
cost=np.zeros(iterations)
for i in range(iterations):
error=(X*theta.T)-y
for j in range(parameters):
term=np.multiply(error,X[:,j])
temp[0,j]=theta[0,j]-((alpha/len(X))*np.sum(term))
theta=temp
cost[i]=computeCost(X,y,theta)
return theta,cost
g,cost=gradientDescent(X,y,theta,alpha,iterations)
computeCost(X,y,g)
试图了解这些错误以及从何处开始? inf和nan的重复次数不止于此,但是我从它们中删除了几行而不是帖子。任何帮助或正确方向的一点将不胜感激。 的输出
[gato@archlinux test1]$ python t.py
6.195789561917885e+31
2.030393130101553e+50
6.653766765521488e+68
2.1804945815191573e+87
7.14566167944397e+105
2.341692626518703e+124
7.673921049000399e+142
2.5148076053789887e+161
8.241233199676728e+179
2.700720504669319e+198
8.85048519756413e+216
2.9003774399044243e+235
9.504777541712369e+253
3.1147944703506266e+272
1.0207439942649033e+291
/usr/lib/python3.6/site-packages/numpy/core/_methods.py:32: RuntimeWarning: overflow encountered
in reduce
return umr_sum(a, axis, dtype, out, keepdims)
inf
t.py:23: RuntimeWarning: overflow encountered in power
z=np.power(((X*theta.T)-y),2)
inf
inf
inf
inf
t.py:38: RuntimeWarning: invalid value encountered in double_scalars
temp[0,j]=theta[0,j]-((alpha/len(X))*np.sum(term))
nan
nan
nan
nan