我正在python上运行线性回归问题,不幸的是我的代码未显示任何结果。它没有显示任何编译时问题,但也没有提供任何答案。不知道我的计算机是否速度太慢,迭代次数太多或其他。
预期输出值theta 陷入无限循环
代码:
import pandas as p
# Using pandas to read features from our training set
dinit = p.read_csv('train.csv')
df = p.read_csv('train.csv')
b = list([df['crim'],df['zn'],df['indus'],df['chas'],df['nox'],df['rm'],df['age'],df['dis'],df['rad'],df['tax'],df['ptratio'],df['black'],df['lstat'],df['medv']])
bmean = [0,0,0,0,0,0,0,0,0,0,0,0,0,0]
# Calculating mean
for i in range(14):
for j in range(333):
bmean[i] += b[i][j]
for i in range(14):
bmean[i] /= 333
bsigma = [0,0,0,0,0,0,0,0,0,0,0,0,0,0]
# Calculating standard deviation
for i in range(14):
for j in range(333):
bsigma[i] += (b[i][j]-bmean[i])**2
for i in range(14):
bsigma[i] = (bsigma[i]/333)**0.5
# NOTE :- Replace 13 and 333 with dimensions of list+1
# Normalising data
for i in range(14):
for j in range(333):
b[i][j] = (b[i][j]-bmean[i])/bsigma[i]
theta = [0,0,0,0,0,0,0,0,0,0,0,0,0]
def costfun(theta ,b ):
hypo = 0
cost = 0
y = 0
for j in range(332):
for i in range(13):
hypo+=(theta[i]*b[i][j])
y += b[13][j]
cost+=(hypo-y)**2
hypo = 0
y = 0
cost/=333
return cost
print(b)
def GradientDescent(theta, b):
alpha = 0.1
hypo = 0
cost = 0
y = 0
while (costfun(theta, b) > 1):
for j in range(332):
for i in range(13):
hypo += (theta[i] * b[i][j])
y += b[13][j]
for i in range(13):
theta[i] = theta[i]-((hypo-y)*b[i][j]*alpha/333)
return theta
print(bmean)
print(GradientDescent(theta,b))
答案 0 :(得分:0)
关于这段代码我要说的太多了,所以让我们从头开始:
0)线性回归的实现方式很多。仅举几例:
1)如果您正在做数学运算,请使用numpy ...内置矢量化操作,这将避免您进行许多容易出错的循环。例如,下面是代码的前30行:
import numpy as np
# Generate toy data, mu=10, sigma=5
M, N = 1000, 14
features = np.random.normal(10, 5, size=(M,N))
# Normalise
norm = (features - features.mean(axis=0)) / features.std(axis=0)
# Initialise theta
theta = np.zeros(N)
(另外,我应该说,对于熊猫来说,相同的操作几乎可以完成,所以为什么还要麻烦使用列表?)
2)您的成本函数也是如此。线性回归应该像Y = aX + b
一样简单3)while循环除了检查成本外没有其他停止条件。换句话说,如果您的算法没有收敛(很可能是您问我的话),它将永远不会停止...
这是一个如何在python中实现线性回归的示例:https://www.cs.toronto.edu/~frossard/post/linear_regression/