fmin_tnc的渐变不起作用

时间:2018-07-04 12:17:37

标签: python-3.x numpy machine-learning scipy logistic-regression

我正在训练用于手写识别的多类逻辑回归。为了使功能最小化,我正在使用fmin_tnc。 我已经实现了渐变函数,如下所示:

    def gradient(theta,*args):
        X,y,lamda = args;
        m = np.size(X,0);
        h = X.dot(theta);
        grad = (1/m) * X.T.dot(  sigmoid(h)-y );
        grad[1:np.size(grad),] = grad[1:np.size(grad),] + (lamda/
                        m)*theta[1:np.size(theta),] ;
    return grad.flatten() 
#flattened because  fmin_tnc accepts list of gradients

对于下面提供的小集合示例,这会产生正确的梯度值:

  theta_t = np.array([[-2],[-1],[1],[2]]);
  X_t = np.array([[1,0.1,0.6,1.1],[1,0.2,0.7,1.2],[1,0.3,0.8,1.3], 
      [1,0.4,0.9,1.4],[1,0.5,1,1.5]])
  y_t = np.array([[1],[0],[1],[0],[1]])
  lamda_t = 3

但是当使用scipy中的checkgrad函数时,其给定错误为0.6222474393497573 我无法跟踪为什么会这样。因为这可能是fmin_tnc没有执行任何优化,并且总是提供与给定初始参数相等的优化参数。

1 个答案:

答案 0 :(得分:0)

fmin_tnc函数调用如下:

    optimize.fmin_tnc(func=lrcostfunction, x0=initial_theta,fprime = gradient,args= 
   (X,tmp_y.flatten(),lamda))

由于y和theta是大小为(n,)的1-d数组形式,因此应将其转换为大小为(n,1)的2-d数组,这是因为2-d数组形式用于渐变功能实现。   正确的实现如下:

     def gradient(theta,*args):
         #again y and theta reshaped for same reason 
         X,y,lamda = args;
         l = np.size(X,1);
         theta = np.reshape(theta,(l,1));
         m = np.size(X,0);
         y = np.reshape(y,(m,1));
         h = sigmoid( X.dot(theta) );

         grad = (1/m) * X.T.dot( h-y );
         grad[1:np.size(grad),] = grad[1:np.size(grad),] + 
             (lamda/m)*theta[1:np.size(theta),] ;

     return grad.ravel()