Question

我正在python上运行线性回归问题，不幸的是我的代码未显示任何结果。它没有显示任何编译时问题，但也没有提供任何答案。不知道我的计算机是否速度太慢，迭代次数太多或其他。

预期输出值theta 陷入无限循环

代码：

import pandas as p
# Using pandas to read features from our training set
dinit = p.read_csv('train.csv')
df = p.read_csv('train.csv')

b = list([df['crim'],df['zn'],df['indus'],df['chas'],df['nox'],df['rm'],df['age'],df['dis'],df['rad'],df['tax'],df['ptratio'],df['black'],df['lstat'],df['medv']])

bmean = [0,0,0,0,0,0,0,0,0,0,0,0,0,0]
# Calculating mean
for i in range(14):
    for j in range(333):
        bmean[i] += b[i][j]
for i in range(14):
    bmean[i] /= 333
bsigma = [0,0,0,0,0,0,0,0,0,0,0,0,0,0]
# Calculating standard deviation
for i in range(14):
    for j in range(333):
        bsigma[i] += (b[i][j]-bmean[i])**2
for i in range(14):
    bsigma[i] = (bsigma[i]/333)**0.5

# NOTE :- Replace 13 and 333 with dimensions of list+1
# Normalising data
for i in range(14):
    for j in range(333):
        b[i][j] = (b[i][j]-bmean[i])/bsigma[i]

theta = [0,0,0,0,0,0,0,0,0,0,0,0,0]


def costfun(theta ,b ):
    hypo = 0
    cost = 0
    y = 0
    for j in range(332):
        for i in range(13):
           hypo+=(theta[i]*b[i][j])
           y += b[13][j]
        cost+=(hypo-y)**2
        hypo = 0
        y = 0
    cost/=333
    return cost

print(b)
def GradientDescent(theta, b):
    alpha = 0.1
    hypo = 0
    cost = 0
    y = 0
    while (costfun(theta, b) > 1):
        for j in range(332):
            for i in range(13):
                hypo += (theta[i] * b[i][j])
                y += b[13][j]
            for i in range(13):
                theta[i] = theta[i]-((hypo-y)*b[i][j]*alpha/333)
    return theta
print(bmean)

print(GradientDescent(theta,b))

Answer 1

关于这段代码我要说的太多了，所以让我们从头开始：

0）线性回归的实现方式很多。仅举几例：

sklearn.linear_model.LinearRegression
scipy.stats.linregress
可能很多，但谁在乎，那是最好的

1）如果您正在做数学运算，请使用numpy ...内置矢量化操作，这将避免您进行许多容易出错的循环。例如，下面是代码的前30行：

import numpy as np

# Generate toy data, mu=10, sigma=5
M, N = 1000, 14
features = np.random.normal(10, 5, size=(M,N))

# Normalise
norm = (features - features.mean(axis=0)) / features.std(axis=0)

# Initialise theta
theta = np.zeros(N)

（另外，我应该说，对于熊猫来说，相同的操作几乎可以完成，所以为什么还要麻烦使用列表？）

2）您的成本函数也是如此。线性回归应该像Y = aX + b

一样简单

3）while循环除了检查成本外没有其他停止条件。换句话说，如果您的算法没有收敛（很可能是您问我的话），它将永远不会停止...

这是一个如何在python中实现线性回归的示例：https://www.cs.toronto.edu/~frossard/post/linear_regression/

无限循环或慢速计算机

1 个答案: