SVM使用梯度下降 - 公式

时间:2014-09-15 16:59:09

标签: machine-learning octave svm gradient-descent

使用Gradient Descent做线性SVM(支持向量机)时遇到了一些困难。

我正在使用的公式如下。

enter image description here

其中第一个等式是成本函数,第二个等式是每个特征的θ值。

c是拟合参数(正则化参数)

的控制

alpha决定斜率收敛的速率。

不知何故当我为我的数据集运行上面的公式时,My j(theta)继续增加,它永远不会减少。我通过更改c和alpha的值尝试了所有可能的方案。

如果有任何错误可能在公式中,我会很高兴,如果有人可以指出它。

这是我正在使用的Octave代码:

clear all                                                                                                                    
clc

x=[3,1;2,2;1,2;1.5,3;4,1;4,2;4,3;4,5];

y=[1;1;1;1;0;0;0;0];

[m,n]=size(x);
x=[ones(m,1),x];

X=x;

hold off
%. In this step we will plot the graph for the given input data set just to see how is the distribution of the two class.
pos = find(y == 1);  % This will take the postion or array number from y for all the class that has value 1 
neg = find(y == 0);  % Similarly this will take the position or array number from y for all class that has value 0
 % Now we plot the graph column x1 Vs x2 for y=1 and y=0


hold on  
plot(X(pos, 2), X(pos,3), '+');                                                      
plot(X(neg, 2), X(neg, 3), 'o');
axis([min(x(:,2))-2,max(x(:,2))+2, min(x(:,3))-2, max(x(:,3))+2])
xlabel('x1 marks in subject 1')
ylabel('y1 marks in subject 2')
legend('diagonal','failed', 'Pass')
hold off


% feature scaling
%   Now we limit the x1 and x2 we need to leave or skip the first column x0 because they should stay as 1.
% If we dont do feature scalling then the decision line would be opposite.
%mn = mean(x);
%sd = std(x);
%x(:,2) = (x(:,2) - mn(2))./ sd(2);
%x(:,3) = (x(:,3) - mn(3))./ sd(3);




% Algorith for linear SVM
g=inline('1.0 ./ (1.0 + exp(-z))');
theta = zeros(size(x(1,:)))'; 
max_iter=100;
j_theta=zeros(max_iter,1);            % j is a zero matrix that is used to store the theta cost function j(theta)
c=0.1;    
alpha=0.1;
for num_iter =1:max_iter
z=x*theta;
h=g(z);
h
j_theta(num_iter)=c .* (-y'* log(h) - (1 - y)'*log(1-h)) + ((0.5) * (theta'*theta));  % the second term is regularization
%% the above equation computes the cost function

    grad = (c^2) * x' * (h-y);   %% computed the gradient descent 
    reg_exprson= alpha .* (0.5) * (theta'*theta);  %% Computes the regularization term
    theta=theta - (alpha.*grad) - reg_exprson ;  %% Computes the new theta vector for each feature
    theta  

end   

    figure
    plot(0:99, j_theta(1:100), 'b', 'LineWidth', 2)

由于

0 个答案:

没有答案