我正在使用我的Matlab代码来安装Andrew NG Coursera课程并将其转换为python。我正在研究非正则逻辑回归,在写完我的渐变和成本函数后,我需要类似于fminunc的东西,经过一些谷歌搜索后,我找到了几个选项。它们都返回相同的结果,但它们与Andrew NG的预期结果代码中的不匹配。其他人似乎正在使这个工作正常,但我想知道为什么我的特定代码在使用scipy.optimize函数时似乎没有返回所需的结果,但是对于代码中较早的成本和渐变部分。
我正在使用的数据可以在以下链接中找到;
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as op
#Machine Learning Online Class - Exercise 2: Logistic Regression
#Load Data
#The first two columns contains the exam scores and the third column contains the label.
data = pd.read_csv('ex2data1.txt', header = None)
X = np.array(data.iloc[:, 0:2]) #100 x 3
y = np.array(data.iloc[:,2]) #100 x 1
y.shape = (len(y), 1)
#Creating sub-dataframes for plotting
pos_plot = data[data[2] == 1]
neg_plot = data[data[2] == 0]
#==================== Part 1: Plotting ====================
#We start the exercise by first plotting the data to understand the
#the problem we are working with.
print('Plotting data with + indicating (y = 1) examples and o indicating (y = 0) examples.')
plt.plot(pos_plot[0], pos_plot[1], "+", label = "Admitted")
plt.plot(neg_plot[0], neg_plot[1], "o", label = "Not Admitted")
plt.xlabel('Exam 1 score')
plt.ylabel('Exam 2 score')
plt.legend()
plt.show()
def sigmoid(z):
'''
SIGMOID Compute sigmoid function
g = SIGMOID(z) computes the sigmoid of z.
Instructions: Compute the sigmoid of each value of z (z can be a matrix,
vector or scalar).
'''
g = 1 / (1 + np.exp(-z))
return g
def costFunction(theta, X, y):
'''
COSTFUNCTION Compute cost and gradient for logistic regression
J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
parameter for logistic regression and the gradient of the cost
w.r.t. to the parameters.
'''
m = len(y) #number of training examples
h = sigmoid(X.dot(theta)) #logisitic regression hypothesis
J = (1/m) * np.sum((-y*np.log(h)) - ((1-y)*np.log(1-h)))
#h is 100x1, y is %100x1, these end up as 2 vector we subtract from each other
#then we sum the values by rows
#cost function for logisitic regression
return J
def gradient(theta, X, y):
m = len(y)
grad = np.zeros((theta.shape))
h = sigmoid(X.dot(theta))
for i in range(len(theta)): #number of rows in theta
XT = X[:,i]
XT.shape = (len(X),1)
grad[i] = (1/m) * np.sum((h-y)*XT) #updating each row of the gradient
return grad
#============ Part 2: Compute Cost and Gradient ============
#In this part of the exercise, you will implement the cost and gradient
#for logistic regression. You neeed to complete the code in costFunction.m
#Add intercept term to x and X_test
Bias = np.ones((len(X), 1))
X = np.column_stack((Bias, X))
#Initialize fitting parameters
initial_theta = np.zeros((len(X[0]), 1))
#Compute and display initial cost and gradient
(cost, grad) = costFunction(initial_theta, X, y), gradient(initial_theta, X, y)
print('Cost at initial theta (zeros): %f' % cost)
print('Expected cost (approx): 0.693\n')
print('Gradient at initial theta (zeros):')
print(grad)
print('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628')
#Compute and display cost and gradient with non-zero theta
test_theta = np.array([[-24], [0.2], [0.2]]);
(cost, grad) = costFunction(test_theta, X, y), gradient(test_theta, X, y)
print('\nCost at test theta: %f' % cost)
print('Expected cost (approx): 0.218\n')
print('Gradient at test theta:')
print(grad)
print('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n')
result = op.fmin_tnc(func = costFunction, x0 = initial_theta, fprime = gradient, args = (X,y))
result[1]
Result = op.minimize(fun = costFunction,
x0 = initial_theta,
args = (X, y),
method = 'TNC',
jac = gradient, options={'gtol': 1e-3, 'disp': True, 'maxiter': 1000})
theta = Result.x
theta
test = np.array([[1, 45, 85]])
prob = sigmoid(test.dot(theta))
print('For a student with scores 45 and 85, we predict an admission probability of %f,' % prob)
print('Expected value: 0.775 +/- 0.002\n')
答案 0 :(得分:3)
这是一个非常难以调试的问题,并说明了scipy.optimize
接口的文档记录方面。文档模糊地表明theta
将作为向量传递:
最小化一个或多个变量的标量函数。
通常,优化问题的形式如下:
minimize f(x) subject to g_i(x) >= 0, i = 1,...,m h_j(x) = 0, j = 1,...,p
其中x是一个或多个变量的向量。
重要的是,它们实际上是指最原始意义上的 vector ,即一维数组。所以你必须期望只要theta
传递给你的一个回调,它就会作为一维数组传入。但在numpy
中,1-d数组的行为有时与2-d行数组(显然,来自2-d列数组)的行为不同。
我不确切知道为什么它会导致您的案件出现问题,但无论如何都很容易解决。您只需在成本函数和渐变函数的顶部添加以下内容:
theta = theta.reshape(-1, 1)
这可以保证theta
将是一个二维列数组,如预期的那样。完成此操作后,结果就是正确的。
答案 1 :(得分:0)
我和Scipy有过类似的问题,处理和你一样的问题。正如发送者指出的那样,界面不是最容易处理的,特别是与numpy数组接口相结合......这是我的实现,它按预期工作。
请注意,initial_theta作为一个简单的shape(3,)数组传递,并转换为函数内形状(3,1)的列向量。渐变函数然后再次返回具有形状(3,)的grad.ravel()。这很重要,因为否则会在Scipy.optimize中使用各种优化方法导致错误消息。
请注意,不同的方法有不同的行为,但返回.ravel()似乎可以解决大多数问题......
import pandas as pd
import numpy as np
import scipy.optimize as opt
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def CostFunc(theta,X,y):
#Initializing variables
m = len(y)
J = 0
grad = np.zeros(theta.shape)
#Vectorized computations
z = X @ theta
h = sigmoid(z)
J = (1/m) * ( (-y.T @ np.log(h)) - (1 - y).T @ np.log(1-h));
return J
def Gradient(theta,X,y):
#Initializing variables
m = len(y)
theta = theta[:,np.newaxis]
grad = np.zeros(theta.shape)
#Vectorized computations
z = X @ theta
h = sigmoid(z)
grad = (1/m)*(X.T @ ( h - y));
return grad.ravel() #<-- This is the trick
Note that initial_theta.shape
返回(3,)
X = data1.iloc[:,0:2].values
m,n = X.shape
X = np.concatenate((np.ones(m)[:,np.newaxis],X),1)
y = data1.iloc[:,-1].values[:,np.newaxis]
initial_theta = np.zeros((n+1))
model = opt.minimize(fun = CostFunc, x0 = initial_theta, args = (X, y), method = 'TNC', jac = Gradient)
欢迎更多知识渊博的人发表评论,这个Scipy界面对我来说是一个谜,谢谢