无限循环或慢速计算机

时间:2019-04-06 09:09:13

标签: python pandas machine-learning linear-regression

我正在python上运行线性回归问题,不幸的是我的代码未显示任何结果。它没有显示任何编译时问题,但也没有提供任何答案。不知道我的计算机是否速度太慢,迭代次数太多或其他。

预期输出值theta 陷入无限循环

代码:

import pandas as p
# Using pandas to read features from our training set
dinit = p.read_csv('train.csv')
df = p.read_csv('train.csv')

b = list([df['crim'],df['zn'],df['indus'],df['chas'],df['nox'],df['rm'],df['age'],df['dis'],df['rad'],df['tax'],df['ptratio'],df['black'],df['lstat'],df['medv']])

bmean = [0,0,0,0,0,0,0,0,0,0,0,0,0,0]
# Calculating mean
for i in range(14):
    for j in range(333):
        bmean[i] += b[i][j]
for i in range(14):
    bmean[i] /= 333
bsigma = [0,0,0,0,0,0,0,0,0,0,0,0,0,0]
# Calculating standard deviation
for i in range(14):
    for j in range(333):
        bsigma[i] += (b[i][j]-bmean[i])**2
for i in range(14):
    bsigma[i] = (bsigma[i]/333)**0.5

# NOTE :- Replace 13 and 333 with dimensions of list+1
# Normalising data
for i in range(14):
    for j in range(333):
        b[i][j] = (b[i][j]-bmean[i])/bsigma[i]

theta = [0,0,0,0,0,0,0,0,0,0,0,0,0]


def costfun(theta ,b ):
    hypo = 0
    cost = 0
    y = 0
    for j in range(332):
        for i in range(13):
           hypo+=(theta[i]*b[i][j])
           y += b[13][j]
        cost+=(hypo-y)**2
        hypo = 0
        y = 0
    cost/=333
    return cost

print(b)
def GradientDescent(theta, b):
    alpha = 0.1
    hypo = 0
    cost = 0
    y = 0
    while (costfun(theta, b) > 1):
        for j in range(332):
            for i in range(13):
                hypo += (theta[i] * b[i][j])
                y += b[13][j]
            for i in range(13):
                theta[i] = theta[i]-((hypo-y)*b[i][j]*alpha/333)
    return theta
print(bmean)

print(GradientDescent(theta,b))

1 个答案:

答案 0 :(得分:0)

关于这段代码我要说的太多了,所以让我们从头开始:

0)线性回归的实现方式很多。仅举几例:

  • sklearn.linear_model.LinearRegression
  • scipy.stats.linregress
  • 可能很多,但谁在乎,那是最好的

1)如果您正在做数学运算,请使用numpy ...内置矢量化操作,这将避免您进行许多容易出错的循环。例如,下面是代码的前30行:

import numpy as np

# Generate toy data, mu=10, sigma=5
M, N = 1000, 14
features = np.random.normal(10, 5, size=(M,N))

# Normalise
norm = (features - features.mean(axis=0)) / features.std(axis=0)

# Initialise theta
theta = np.zeros(N)

(另外,我应该说,对于熊猫来说,相同的操作几乎可以完成,所以为什么还要麻烦使用列表?)

2)您的成本函数也是如此。线性回归应该像Y = aX + b

一样简单

3)while循环除了检查成本外没有其他停止条件。换句话说,如果您的算法没有收敛(很可能是您问我的话),它将永远不会停止...

这是一个如何在python中实现线性回归的示例:https://www.cs.toronto.edu/~frossard/post/linear_regression/