Question

我有以下代码来使用渐变最小化成本函数。

def trainLinearReg( X, y, lamda ):
    # theta = zeros( shape(X)[1], 1 )
    theta = random.rand( shape(X)[1], 1 ) # random initialization of theta

    result = scipy.optimize.fmin_cg( computeCost, fprime = computeGradient, x0 = theta, 
                                     args = (X, y, lamda), maxiter = 200, disp = True, full_output = True )
    return result[1], result[0]

但是我有这个警告：

Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 8403387632289934651424768.000000
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3

我的computeCost和computeGradient定义为

def computeCost( theta, X, y, lamda ):
    theta = theta.reshape( shape(X)[1], 1 )
    m     = shape(y)[0]
    J     = 0
    grad  = zeros( shape(theta) )

    h = X.dot(theta)
    squaredErrors = (h - y).T.dot(h - y)
    # theta[0] = 0.0
    J = (1.0 / (2 * m)) * (squaredErrors) + (lamda / (2 * m)) * (theta.T.dot(theta))

    return J[0]

def computeGradient( theta, X, y, lamda ):
    theta = theta.reshape( shape(X)[1], 1 )
    m     = shape(y)[0]
    J     = 0
    grad  = zeros( shape(theta) )

    h = X.dot(theta)
    squaredErrors = (h - y).T.dot(h - y)
    # theta[0] = 0.0
    J = (1.0 / (2 * m)) * (squaredErrors) + (lamda / (2 * m)) * (theta.T.dot(theta))
    grad = (1.0 / m) * (X.T.dot(h - y)) + (lamda / m) * theta

    return grad.flatten()

我已经回顾了这些类似的问题：

scipy.optimize.fmin_bfgs: “Desired error not necessarily achieved due to precision loss”

scipy.optimize.fmin_cg: "'Desired error not necessarily achieved due to precision loss.'

scipy is not optimizing and returns "Desired error not necessarily achieved due to precision loss"

但仍然无法解决我的问题。如何让最小化函数过程收敛而不是一开始就被卡住？

解答：

我根据@lejlot的评论解决了这个问题。他是对的。数据集X很大，因为我没有正确地将正确的规范化值返回到正确的变量。即使这是一个小错误，它确实可以让你思考在遇到这些问题时我们应该在哪里看。成本函数值太大会导致我的数据集出现问题。

之前错误的一个：

X_poly            = polyFeatures(X, p)
X_norm, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]

正确的一个：

X_poly            = polyFeatures(X, p)
X_poly, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]

其中X_poly实际上在以下跟踪中用作

cost, theta = trainLinearReg(X_poly, y, lamda)

Answer 1

<强>解答：

我根据@lejlot的评论解决了这个问题。他是对的。数据集X很大，因为我没有正确地将正确的规范化值返回到正确的变量。即使这是一个小错误，它确实可以让你思考在遇到这些问题时我们应该在哪里看。成本函数值太大会导致我的数据集出现问题。

之前错误的一个：

X_poly            = polyFeatures(X, p)
X_norm, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]

正确的一个：

X_poly            = polyFeatures(X, p)
X_poly, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]

其中X_poly实际上在以下跟踪中用作

cost, theta = trainLinearReg(X_poly, y, lamda)

Answer 2

对于我的实现，scipy.optimize.fmin_cg在一些初步猜测中也失败了上述错误。然后我将其更改为BFGS方法并收敛。

 scipy.optimize.minimize(fun, x0, args=(), method='BFGS', jac=None, tol=None, callback=None, options={'disp': False, 'gtol': 1e-05, 'eps': 1.4901161193847656e-08, 'return_all': False, 'maxiter': None, 'norm': inf})

似乎cg中的这个错误仍然是不可避免的， CG ends up with a non-descent direction

Answer 3

我也遇到了这个问题，即使经过大量的搜索解决方案，也没有发生任何事情，因为解决方案没有明确定义。

然后我阅读scipy.optimize.fmin_cg中的文档，其中明确提到参数x0必须是一维数组。

我的方法与你的方法相同，其中我将2-D矩阵作为x0传递，并且我总是得到一些精度误差或除以零误差和相同的警告。

然后我改变了我的方法并将theta作为一维数组传递，并将该数组转换为computeCost和computeGradient函数中的二维矩阵，这对我有用，我得到了预期的结果。

我对Logistic回归的解决方案

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

theta = np.zeros(features)

def computeCost(theta,X, Y):
    x = np.matrix(X.values)
    y = np.matrix(Y.values)
    theta = np.matrix(theta)
    xtheta = np.matmul(x,theta.T)
    hx = sigmoid(xtheta)
    cost = (np.multiply(y,np.log(hx)))+(np.multiply((1-y),np.log(1-hx)))
    return -(np.sum(cost))/m

    def computeGradient(theta, X, Y):
    x = np.matrix(X.values)
    y = np.matrix(Y.values)
    theta = np.matrix(theta)
    grad = np.zeros(features)
    xtheta = np.matmul(x,theta.T)
    hx = sigmoid(xtheta)
    error = hx-Y
    for i in range(0,features,1):
        term = np.multiply(error,x[:,i])
        grad[i] = (np.sum(term))/m
    return grad

import scipy.optimize as opt  
result = opt.fmin_tnc(func=computeCost, x0=theta, fprime=computeGradient, args=(X, Y)) 

print cost(result[0],X, Y)

再次注意 theta必须是一维数组

因此，在您的代码中，将trainLinearReg中的theta修改为theta = random.randn(features)

Answer 4

我今天遇到了这个问题。

然后，我发现我的成本函数实施方式错误，并产生了大规模错误，因此scipy要求提供更多数据。希望这对像我这样的人有帮助。

fmin_cg：由于精度损失而不一定实现所需的错误

4 个答案: