具有梯度下降误差的逻辑回归

时间:2014-07-22 10:55:07

标签: machine-learning octave logistic-regression gradient-descent

我正在尝试用梯度下降实现逻辑回归,

我得到了我的成本函数j_theta的迭代次数,幸运的是,当j_theta与迭代次数相对应时,我的j_theta正在减小。

我使用的数据集如下:

x=
1   20   30
1   40   60
1   70   30
1   50   50
1   50   40
1   60   40
1   30   40
1   40   50
1   10   20
1   30   40
1   70   70

y=   0
     1
     1
     1
     0
     1
     0
     0
     0
     0
     1

我设法使用Gradient descent编写的逻辑回归代码是:

%1. The below code would load the data present in your desktop to the octave memory 
x=load('stud_marks.dat');
%y=load('ex4y.dat');
y=x(:,3);
x=x(:,1:2);


%2. Now we want to add a column x0 with all the rows as value 1 into the matrix.
%First take the length
[m,n]=size(x);
x=[ones(m,1),x];

X=x;


%   Now we limit the x1 and x2 we need to leave or skip the first column x0 because they     should stay as 1.
mn = mean(x);
sd = std(x);
x(:,2) = (x(:,2) - mn(2))./ sd(2);
x(:,3) = (x(:,3) - mn(3))./ sd(3);

% We will not use vectorized technique, Because its hard to debug, We shall try using many for loops rather

max_iter=50;

theta = zeros(size(x(1,:)))'; 
j_theta=zeros(max_iter,1);         

for num_iter=1:max_iter
  % We calculate the cost Function
  j_cost_each=0;
  alpha=1;
  theta
    for i=1:m
        z=0;
        for j=1:n+1
%            theta(j)
            z=z+(theta(j)*x(i,j));  
            z
        end
        h= 1.0 ./(1.0 + exp(-z));
        j_cost_each=j_cost_each + ( (-y(i) * log(h)) -  ((1-y(i)) * log(1-h)) );  
%       j_cost_each
    end  
    j_theta(num_iter)=(1/m) * j_cost_each;

    for j=1:n+1
        grad(j) = 0;
        for i=1:m
            z=(x(i,:)*theta);  
            z            
            h=1.0 ./ (1.0 + exp(-z));
            h
            grad(j) += (h-y(i)) * x(i,j); 
        end
        grad(j)=grad(j)/m;
        grad(j)
        theta(j)=theta(j)- alpha * grad(j);
    end
end      

figure
plot(0:1999, j_theta(1:2000), 'b', 'LineWidth', 2)
hold off


figure
%3. In this step we will plot the graph for the given input data set just to see how is the distribution of the two class.
pos = find(y == 1);  % This will take the postion or array number from y for all the class     that has value 1 
neg = find(y == 0);  % Similarly this will take the position or array number from y for all     class that has value 0
 % Now we plot the graph column x1 Vs x2 for y=1 and y=0
plot(x(pos, 2), x(pos,3), '+'); 
hold on
plot(x(neg, 2), x(neg, 3), 'o');
xlabel('x1 marks in subject 1')
ylabel('y1 marks in subject 2')
legend('pass', 'Failed')


plot_x = [min(x(:,2))-2,  max(x(:,2))+2];     % This min and max decides the length of the decision graph.
% Calculate the decision boundary line
plot_y = (-1./theta(3)).*(theta(2).*plot_x +theta(1));
plot(plot_x, plot_y)
hold off

%%%%%%% The only difference is In the last plot I used X where as now I use x whose attributes or features are featured scaled %%%%%%%%%%%

如果您查看x1与x2的图形,图形将如下,

enter image description here

运行我的代码后,我创建了一个决策边界。决策线的形状似乎没问题,但有点流离失所。具有决策边界的x1对x2的图形如下:

![在此输入图片说明] [2]

请建议我哪里出错了。

感谢:)

新图::::

![enter image description here][1]


If you see the new graph the coordinated of x axis have changed ..... Thats because I use x(feature scalled) instead of X. 

1 个答案:

答案 0 :(得分:4)

问题在于成本函数计算和/或梯度计算,您的绘图功能很好。我运行了我为逻辑回归实现的算法的数据集,但是使用了矢量化技术,因为在我看来它更容易调试。 我得到的最终值是

theta =   [-76.4242,     0.8214,     0.7948 我还使用了 alpha = 0.3

我绘制了决策边界并且它看起来很好,我建议使用矢量化形式,因为它更容易实现并在我看来调试。

Decision Boundary

我也认为你的梯度下降实现不太正确。 50次迭代是不够的,最后一次迭代的成本不够好。也许您应该尝试在停止条件下运行它以进行更多迭代。 另请参阅本讲座以了解优化技巧。 https://class.coursera.org/ml-006/lecture/37