多变量梯度下降Matlab-两种代码有什么区别?

时间:2018-08-14 19:20:16

标签: matlab machine-learning octave gradient-descent

以下函数使用梯度下降法找到回归线的最佳“θ”值。输入(X,y)附在下面。我的问题是代码1和代码2有什么区别?为什么代码2可以工作但代码1不能工作?

谢谢!

GRADIENTDESCENTMULTI执行梯度下降以学习theta, 它通过学习速率为alpha的num_iters个梯度步骤来更新theta

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
% Initialize some useful values
m = length(y); % number of training examples
n = length(theta);
J_history = zeros(num_iters, 1);
costs = zeros(n,1);

for iter = 1:num_iters
    % code 1 - doesn't work 
    for c = 1:n
        for i = 1:m
            costs(c) = costs(c)+(X(i,:)*theta - y(i))*X(i,c);
        end  
    end

    % code 2 - does work
    E = X * theta - y;
    for c = 1:n
        costs(c) = sum(E.*X(:,c));
    end

    % update each theta
    for c = 1:n
        theta(c) = theta(c) - alpha*costs(c)/m;
    end
    J_history(iter) = computeCostMulti(X, y, theta);    
end
end

function J = computeCostMulti(X, y, theta)

for i=1:m
    J = J+(X(i,:)*theta - y(i))^2;
end
J = J/(2*m);

运行代码:

alpha = 0.01;
num_iters = 200; 

% Init Theta and Run Gradient Descent 
theta = zeros(3, 1);
[theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters);

% Plot the convergence graph
figure;
plot(1:numel(J_history), J_history, '-b', 'LineWidth', 2);
xlabel('Number of iterations');
ylabel('Cost J');

% Display gradient descent's result
fprintf('Theta computed from gradient descent: \n');
fprintf(' %f \n', theta);
fprintf('\n');

X是

1.0000    0.1300   -0.2237
1.0000   -0.5042   -0.2237
1.0000    0.5025   -0.2237
1.0000   -0.7357   -1.5378
1.0000    1.2575    1.0904
1.0000   -0.0197    1.0904
1.0000   -0.5872   -0.2237
1.0000   -0.7219   -0.2237
1.0000   -0.7810   -0.2237
1.0000   -0.6376   -0.2237
1.0000   -0.0764    1.0904
1.0000   -0.0009   -0.2237
1.0000   -0.1393   -0.2237
1.0000    3.1173    2.4045
1.0000   -0.9220   -0.2237
1.0000    0.3766    1.0904
1.0000   -0.8565   -1.5378
1.0000   -0.9622   -0.2237
1.0000    0.7655    1.0904
1.0000    1.2965    1.0904
1.0000   -0.2940   -0.2237
1.0000   -0.1418   -1.5378
1.0000   -0.4992   -0.2237
1.0000   -0.0487    1.0904
1.0000    2.3774   -0.2237
1.0000   -1.1334   -0.2237
1.0000   -0.6829   -0.2237
1.0000    0.6610   -0.2237
1.0000    0.2508   -0.2237
1.0000    0.8007   -0.2237
1.0000   -0.2034   -1.5378
1.0000   -1.2592   -2.8519
1.0000    0.0495    1.0904
1.0000    1.4299   -0.2237
1.0000   -0.2387    1.0904
1.0000   -0.7093   -0.2237
1.0000   -0.9584   -0.2237
1.0000    0.1652    1.0904
1.0000    2.7864    1.0904
1.0000    0.2030    1.0904
1.0000   -0.4237   -1.5378
1.0000    0.2986   -0.2237
1.0000    0.7126    1.0904
1.0000   -1.0075   -0.2237
1.0000   -1.4454   -1.5378
1.0000   -0.1871    1.0904
1.0000   -1.0037   -0.2237

Y是

  399900
  329900
  369000
  232000
  539900
  299900
  314900
  198999
  212000
  242500
  239999
  347000
  329999
  699900
  259900
  449900
  299900
  199900
  499998
  599000
  252900
  255000
  242900
  259900
  573900
  249900
  464500
  469000
  475000
  299900
  349900
  169900
  314900
  579900
  285900
  249900
  229900
  345000
  549000
  287000
  368500
  329900
  314000
  299000
  179900
  299900
  239500

1 个答案:

答案 0 :(得分:1)

我认为我的工作正常。最主要的是,在代码1中,您一直添加到cost(c),但从未在下一次迭代之前将其设置为零。您唯一需要更改的就是在cost(c) = 0;之后和for c = 1:n之前添加类似for i = 1:m的内容。我确实需要稍微更改一下代码才能使其对我有用(主要是computeCostMulti),并且我更改了图以显示两种方法的结果都相同。总体而言,这是具有这些更改的有效演示代码段

close all; clear; clc;

%% Data
X = [1.0000  0.1300 -0.2237; 1.0000 -0.5042 -0.2237; 1.0000  0.5025 -0.2237; 1.0000 -0.7357 -1.5378;
    1.0000  1.2575  1.0904; 1.0000 -0.0197  1.0904; 1.0000 -0.5872 -0.2237; 1.0000 -0.7219 -0.2237;
    1.0000 -0.7810 -0.2237; 1.0000 -0.6376 -0.2237; 1.0000 -0.0764  1.0904; 1.0000 -0.0009 -0.2237;
    1.0000 -0.1393 -0.2237; 1.0000  3.1173  2.4045; 1.0000 -0.9220 -0.2237; 1.0000  0.3766  1.0904;
    1.0000 -0.8565 -1.5378; 1.0000 -0.9622 -0.2237; 1.0000  0.7655  1.0904; 1.0000  1.2965  1.0904;
    1.0000 -0.2940 -0.2237; 1.0000 -0.1418 -1.5378; 1.0000 -0.4992 -0.2237; 1.0000 -0.0487  1.0904;
    1.0000  2.3774 -0.2237; 1.0000 -1.1334 -0.2237; 1.0000 -0.6829 -0.2237; 1.0000  0.6610 -0.2237;
    1.0000  0.2508 -0.2237; 1.0000  0.8007 -0.2237; 1.0000 -0.2034 -1.5378; 1.0000 -1.2592 -2.8519;
    1.0000  0.0495  1.0904; 1.0000  1.4299 -0.2237; 1.0000 -0.2387  1.0904; 1.0000 -0.7093 -0.2237;
    1.0000 -0.9584 -0.2237; 1.0000  0.1652  1.0904; 1.0000  2.7864  1.0904; 1.0000  0.2030  1.0904;
    1.0000 -0.4237 -1.5378; 1.0000  0.2986 -0.2237; 1.0000  0.7126  1.0904; 1.0000 -1.0075 -0.2237;
    1.0000 -1.4454 -1.5378; 1.0000 -0.1871  1.0904; 1.0000 -1.0037 -0.2237];
y = [399900 329900 369000 232000 539900 299900 314900 198999 212000 242500 239999 347000 329999,...
    699900 259900 449900 299900 199900 499998 599000 252900 255000 242900 259900 573900 249900,...
    464500 469000 475000 299900 349900 169900 314900 579900 285900 249900 229900 345000 549000,...
    287000 368500 329900 314000 299000 179900 299900 239500]';

alpha = 0.01;
num_iters = 200;

% Init Theta and Run Gradient Descent
theta0 = zeros(3, 1);
[theta_result_1, J_history_1] = gradientDescentMulti(X, y, theta0, alpha, num_iters, 1);
[theta_result_2, J_history_2] = gradientDescentMulti(X, y, theta0, alpha, num_iters, 2);

% Plot the convergence graph for both methods
figure;
x = 1:numel(J_history_1);
subplot(5,1,1:4);
plot(x,J_history_1,x,J_history_2);
xlim([min(x) max(x)]);
set(gca,'XTickLabel','');
ylabel('Cost J');
grid on;

subplot(5,1,5);
stem(x,(J_history_1-J_history_2)./J_history_1,'ko');
xlim([min(x) max(x)]);
xlabel('Number of iterations');
ylabel('frac. \DeltaJ');
grid on;

% Display gradient descent's result
fprintf('Theta computed from gradient descent with method 1: \n');
fprintf(' %f \n', theta_result_1);
fprintf('Theta computed from gradient descent with method 2: \n');
fprintf(' %f \n', theta_result_2);
fprintf('\n');

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters, METHOD)
% Initialize some useful values
m = length(y); % number of training examples
n = length(theta);
J_history = zeros(num_iters, 1);

costs = zeros(n,1);
for iter = 1:num_iters

    if METHOD == 1 % code 1 - does work
        for c = 1:n
            costs(c) = 0;
            for i = 1:m
                costs(c) = costs(c) + (X(i,:)*theta - y(i)) *X(i,c);
            end
        end
    elseif METHOD == 2 % code 2 - does work
        E = X * theta - y;
        for c = 1:n
            costs(c) = sum(E.*X(:,c));
        end
    else
        error('unknown method');
    end

    % update each theta
    for c = 1:n
        theta(c) = theta(c) - alpha*costs(c)/m;
    end
    J_history(iter) = computeCostMulti(X, y, theta);
end
end

function J = computeCostMulti(X, y, theta)
m = length(y); J = 0;
for mi = 1:m
    J = J + (X(mi,:)*theta - y(mi))^2;
end
J = J/(2*m);
end

但是,再次,您实际上只需要添加cost(c) = 0;行。

也;我建议始终在脚本的开头添加close all; clear; clc;行,以确保在将它们复制并粘贴到堆栈溢出中后它们可以正常工作。