我正在训练用于手写识别的多类逻辑回归。为了使功能最小化,我正在使用fmin_tnc。 我已经实现了渐变函数,如下所示:
def gradient(theta,*args):
X,y,lamda = args;
m = np.size(X,0);
h = X.dot(theta);
grad = (1/m) * X.T.dot( sigmoid(h)-y );
grad[1:np.size(grad),] = grad[1:np.size(grad),] + (lamda/
m)*theta[1:np.size(theta),] ;
return grad.flatten()
#flattened because fmin_tnc accepts list of gradients
对于下面提供的小集合示例,这会产生正确的梯度值:
theta_t = np.array([[-2],[-1],[1],[2]]);
X_t = np.array([[1,0.1,0.6,1.1],[1,0.2,0.7,1.2],[1,0.3,0.8,1.3],
[1,0.4,0.9,1.4],[1,0.5,1,1.5]])
y_t = np.array([[1],[0],[1],[0],[1]])
lamda_t = 3
但是当使用scipy中的checkgrad函数时,其给定错误为0.6222474393497573 我无法跟踪为什么会这样。因为这可能是fmin_tnc没有执行任何优化,并且总是提供与给定初始参数相等的优化参数。
答案 0 :(得分:0)
fmin_tnc函数调用如下:
optimize.fmin_tnc(func=lrcostfunction, x0=initial_theta,fprime = gradient,args=
(X,tmp_y.flatten(),lamda))
由于y和theta是大小为(n,)的1-d数组形式,因此应将其转换为大小为(n,1)的2-d数组,这是因为2-d数组形式用于渐变功能实现。 正确的实现如下:
def gradient(theta,*args):
#again y and theta reshaped for same reason
X,y,lamda = args;
l = np.size(X,1);
theta = np.reshape(theta,(l,1));
m = np.size(X,0);
y = np.reshape(y,(m,1));
h = sigmoid( X.dot(theta) );
grad = (1/m) * X.T.dot( h-y );
grad[1:np.size(grad),] = grad[1:np.size(grad),] +
(lamda/m)*theta[1:np.size(theta),] ;
return grad.ravel()