具有梯度下降的逻辑回归导致不同数据集的不同结果

时间:2014-07-30 11:16:40

标签: machine-learning octave logistic-regression gradient-descent

我正在尝试使用具有两个数据集的梯度下降进行逻辑回归,我得到每个数据集的不同结果。

数据集1输入

X=
  1   2   3
  1   4   6
  1   7   3
  1   5   5
  1   5   4
  1   6   4
  1   3   4
  1   4   5
  1   1   2
  1   3   4
  1   7   7
Y= 
   0
   1
   1
   1
   0
   1
   0
   0
   0
   0
   1

数据集2输入 -

x =

1   20   30
1   40   60
1   70   30
1   50   50
1   50   40
1   60   40
1   30   40
1   40   50
1   10   20
1   30   40
1   70   70

y =

0
1
1
1
0
1
0
0
0
0
1

数据集1和数据集2的差异仅是值的范围。当我为这两个数据集运行我的公共代码时,我的代码为数据集提供了所需的输出,但对于数据集2却非常奇怪。

我的代码如下:

[m,n]=size(x);
x=[ones(m,1),x];

X=x;


%3. In this step we will plot the graph for the given input data set just to see how is the distribution of the two class.
pos = find(y == 1);  % This will take the postion or array number from y for all the class that has value 1 
neg = find(y == 0);  % Similarly this will take the position or array number from y for all class that has value 0
 % Now we plot the graph column x1 Vs x2 for y=1 and y=0
plot(X(pos, 2), X(pos,3), '+'); 
hold on
plot(X(neg, 2), X(neg, 3), 'o');
xlabel('x1 marks in subject 1')
ylabel('y1 marks in subject 2')
legend('pass', 'Failed')
hold off

%   Now we limit the x1 and x2 we need to leave or skip the first column x0 because they should stay as 1.

%  The critical thing hear to know is that this is not a linear regression but logistic regression, hence the h(hypothesis varies)
%  So we calculate the hypothesis that is based on e

% j_theta will be calculated upon all the training set for 1st iteration
% 


    g=inline('1.0 ./ (1.0 + exp(-z))');
    alpha=1;
    theta = zeros(size(x(1,:)))';   % the theta has to be a 3*1 matrix so that it can multiply by x that is m*3 matrix
    max_iter=2000;
    j_theta=zeros(max_iter,1);            % j is a zero matrix that is used to store the theta cost function j(theta)

    for num_iter=1:max_iter
    %  Now we calculate the hx or hypothetis, It is calculated here inside no. of iteration because the hupothesis has to be calculated for new theta for every iteration
         z=x*theta;
         h=g(z);     % Here the effect of inline function we used earlier will reflect
         h

           j_theta(num_iter)=(1/m)*(-y'* log(h) - (1 - y)'*log(1-h)) ;    % This formula is the vectorized form of the cost function J(theta) This calculates the cost function      
         j_theta
         theta = theta - (alpha/m) * x' * (1./(1+exp(-x*theta)) - y);
         %grad=(1/m) *  x' * (h-y);     % This formula is the gradient descent formula that calculates the theta value.  
         %theta=theta - alpha .* grad;          % Actual Calculation for theta
           theta
    end

figure
plot(0:1999, j_theta(1:2000), 'b', 'LineWidth', 2)
hold off


figure
%3. In this step we will plot the graph for the given input data set just to see how is the distribution of the two class.
pos = find(y == 1);  % This will take the postion or array number from y for all the class that has value 1 
neg = find(y == 0);  % Similarly this will take the position or array number from y for all class that has value 0
 % Now we plot the graph column x1 Vs x2 for y=1 and y=0
plot(X(pos, 2), X(pos,3), '+'); 
hold on
plot(X(neg, 2), X(neg, 3), 'o');
xlabel('x1 marks in subject 1')
ylabel('y1 marks in subject 2')
legend('pass', 'Failed')


plot_x = [min(X(:,2))-2,  max(X(:,2))+2];     % This min and max decides the length of the decision graph.
% Calculate the decision boundary line
plot_y = (-1./theta(3)).*(theta(2).*plot_x +theta(1));
plot(plot_x, plot_y)
hold off

请查看每个数据集的图表如下:

对于数据集1:

enter image description here

对于dataset2:

enter image description here

正如您所见,数据集给了我正确答案。

话虽如此,我相信datsaet2的数据范围很广,可能是10-100,因此为了规范化,我使用了数据集2的特征缩放并得到了图形。形成的决策线是正确的,但略低于预期的位置,请亲自看看。

具有特征缩放的数据集2输入:

x =

1.00000  -1.16311  -0.89589
1.00000  -0.13957   1.21585
1.00000   1.39573  -0.89589
1.00000   0.37219   0.51194
1.00000   0.37219  -0.19198
1.00000   0.88396  -0.19198
1.00000  -0.65134  -0.19198
1.00000  -0.13957   0.51194
1.00000  -1.67487  -1.59981
1.00000  -0.65134  -0.19198
1.00000   1.39573   1.91977

y =

0
1
1
1
0
1
0
0
0
0
1

我在上面的代码中添加功能缩放后获得的图表如下所示

enter image description here

正如你可以看到决策线是否有点上升那么我会得到完美的输出..

请帮助我理解这个场景,为什么即使是功能扩展也无法帮助。或者如果我的代码有错误,或者我遗漏了什么。

0 个答案:

没有答案