Question

我正在使用我的Matlab代码来安装Andrew NG Coursera课程并将其转换为python。我正在研究非正则逻辑回归，在写完我的渐变和成本函数后，我需要类似于fminunc的东西，经过一些谷歌搜索后，我找到了几个选项。它们都返回相同的结果，但它们与Andrew NG的预期结果代码中的不匹配。其他人似乎正在使这个工作正常，但我想知道为什么我的特定代码在使用scipy.optimize函数时似乎没有返回所需的结果，但是对于代码中较早的成本和渐变部分。

我正在使用的数据可以在以下链接中找到;

ex2data1

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as op


#Machine Learning Online Class - Exercise 2: Logistic Regression

#Load Data
#The first two columns contains the exam scores and the third column contains the label.

data = pd.read_csv('ex2data1.txt', header = None)
X = np.array(data.iloc[:, 0:2]) #100 x 3
y = np.array(data.iloc[:,2]) #100 x 1
y.shape = (len(y), 1)


#Creating sub-dataframes for plotting
pos_plot = data[data[2] == 1]
neg_plot = data[data[2] == 0]


#==================== Part 1: Plotting ====================
#We start the exercise by first plotting the data to understand the 
#the problem we are working with.

print('Plotting data with + indicating (y = 1) examples and o indicating (y = 0) examples.')

plt.plot(pos_plot[0], pos_plot[1], "+", label = "Admitted")
plt.plot(neg_plot[0], neg_plot[1], "o", label = "Not Admitted")
plt.xlabel('Exam 1 score')
plt.ylabel('Exam 2 score')
plt.legend()
plt.show()


def sigmoid(z):
    '''
    SIGMOID Compute sigmoid function
    g = SIGMOID(z) computes the sigmoid of z.
    Instructions: Compute the sigmoid of each value of z (z can be a matrix,
    vector or scalar).
    '''
    g = 1 / (1 + np.exp(-z))
    return g


def costFunction(theta, X, y):
    '''
    COSTFUNCTION Compute cost and gradient for logistic regression
    J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
    parameter for logistic regression and the gradient of the cost
    w.r.t. to the parameters.
    '''
    m = len(y) #number of training examples

    h = sigmoid(X.dot(theta)) #logisitic regression hypothesis
    J = (1/m) * np.sum((-y*np.log(h)) - ((1-y)*np.log(1-h)))

    #h is 100x1, y is %100x1, these end up as 2 vector we subtract from each other
    #then we sum the values by rows
    #cost function for logisitic regression
    return J

def gradient(theta, X, y):
    m = len(y)
    grad = np.zeros((theta.shape))
    h = sigmoid(X.dot(theta))
    for i in range(len(theta)): #number of rows in theta
        XT = X[:,i]
        XT.shape = (len(X),1)
        grad[i] = (1/m) * np.sum((h-y)*XT) #updating each row of the gradient
    return grad


#============ Part 2: Compute Cost and Gradient ============
#In this part of the exercise, you will implement the cost and gradient
#for logistic regression. You neeed to complete the code in costFunction.m


#Add intercept term to x and X_test
Bias = np.ones((len(X), 1))
X = np.column_stack((Bias, X))


#Initialize fitting parameters
initial_theta = np.zeros((len(X[0]), 1))


#Compute and display initial cost and gradient
(cost, grad) = costFunction(initial_theta, X, y), gradient(initial_theta, X, y)

print('Cost at initial theta (zeros): %f' % cost)
print('Expected cost (approx): 0.693\n')
print('Gradient at initial theta (zeros):')
print(grad)
print('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628')


#Compute and display cost and gradient with non-zero theta
test_theta = np.array([[-24], [0.2], [0.2]]);
(cost, grad) = costFunction(test_theta, X, y), gradient(test_theta, X, y)

print('\nCost at test theta: %f' % cost)
print('Expected cost (approx): 0.218\n')
print('Gradient at test theta:')
print(grad)
print('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n')


result = op.fmin_tnc(func = costFunction, x0 = initial_theta, fprime = gradient, args = (X,y))
result[1]


Result = op.minimize(fun = costFunction, 
                                 x0 = initial_theta, 
                                 args = (X, y),
                                 method = 'TNC',
                                 jac = gradient, options={'gtol': 1e-3, 'disp': True, 'maxiter': 1000})


theta = Result.x
theta

test = np.array([[1, 45, 85]]) 
prob = sigmoid(test.dot(theta))
print('For a student with scores 45 and 85, we predict an admission probability of %f,' % prob)
print('Expected value: 0.775 +/- 0.002\n')

Answer 1

这是一个非常难以调试的问题，并说明了scipy.optimize接口的文档记录方面。文档模糊地表明theta将作为向量传递：

最小化一个或多个变量的标量函数。

通常，优化问题的形式如下：
minimize f(x) subject to

g_i(x) >= 0,  i = 1,...,m
h_j(x)  = 0,  j = 1,...,p 
其中x是一个或多个变量的向量。

重要的是，它们实际上是指最原始意义上的 vector ，即一维数组。所以你必须期望只要theta传递给你的一个回调，它就会作为一维数组传入。但在numpy中，1-d数组的行为有时与2-d行数组（显然，来自2-d列数组）的行为不同。

我不确切知道为什么它会导致您的案件出现问题，但无论如何都很容易解决。您只需在成本函数和渐变函数的顶部添加以下内容：

theta = theta.reshape(-1, 1)

这可以保证theta将是一个二维列数组，如预期的那样。完成此操作后，结果就是正确的。

Answer 2

我和Scipy有过类似的问题，处理和你一样的问题。正如发送者指出的那样，界面不是最容易处理的，特别是与numpy数组接口相结合......这是我的实现，它按预期工作。

定义成本和梯度函数

请注意，initial_theta作为一个简单的shape（3，）数组传递，并转换为函数内形状（3,1）的列向量。渐变函数然后再次返回具有形状（3，）的grad.ravel（）。这很重要，因为否则会在Scipy.optimize中使用各种优化方法导致错误消息。

请注意，不同的方法有不同的行为，但返回.ravel（）似乎可以解决大多数问题......

import pandas as pd
import numpy as np
import scipy.optimize as opt

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def CostFunc(theta,X,y):

    #Initializing variables
    m = len(y)
    J = 0
    grad = np.zeros(theta.shape)

    #Vectorized computations
    z = X @ theta
    h = sigmoid(z)
    J = (1/m) * ( (-y.T @ np.log(h)) - (1 - y).T @ np.log(1-h));

    return J

def Gradient(theta,X,y):

    #Initializing variables
    m = len(y)
    theta = theta[:,np.newaxis]
    grad = np.zeros(theta.shape)

    #Vectorized computations
    z = X @ theta
    h = sigmoid(z)
    grad = (1/m)*(X.T @ ( h - y));

    return grad.ravel() #<-- This is the trick

初始化变量和参数

Note that initial_theta.shape返回（3，）

X = data1.iloc[:,0:2].values
m,n = X.shape
X = np.concatenate((np.ones(m)[:,np.newaxis],X),1)
y = data1.iloc[:,-1].values[:,np.newaxis]
initial_theta = np.zeros((n+1))

调用Scipy.optimize

model = opt.minimize(fun = CostFunc, x0 = initial_theta, args = (X, y), method = 'TNC', jac = Gradient)

欢迎更多知识渊博的人发表评论，这个Scipy界面对我来说是一个谜，谢谢

成本函数和梯度似乎工作，但scipy.optimize函数不是

2 个答案:

定义成本和梯度函数

初始化变量和参数

调用Scipy.optimize