Question

我在堆栈溢出中经历了很多代码并在同一行上创建了自己的代码。这段代码有些问题我无法理解。我存储值theta1和theta 2以及成本函数用于分析目的。可以从此处下载x和Y的数据 Openclassroom页面。它有.dat文件形式的x和Y数据，您可以在记事本中打开。

    %Single Variate Gradient Descent Algorithm%%
    clc
clear all
close all;
% Step 1 Load x series/ Input data and Output data* y series

x=load('D:\Office Docs_Jay\software\ex2x.dat');
y=load('D:\Office Docs_Jay\software\ex2y.dat');
%Plot the input vectors
plot(x,y,'o');
ylabel('Height in meters');
xlabel('Age in years');

% Step 2 Add an extra column of ones in input vector
[m n]=size(x);
X=[ones(m,1) x];%Concatenate the ones column with x;
% Step 3 Create Theta vector
theta=zeros(n+1,1);%theta 0,1
% Create temporary values for storing summation

temp1=0;
temp2=0;
% Define Learning Rate alpha and Max Iterations

alpha=0.07;
max_iterations=1;
      % Step 4 Iterate over loop
      for i=1:1:max_iterations

     %Calculate Hypothesis for all training example
     for k=1:1:m
        h(k)=theta(1,1)+theta(2,1)*X(k,2); %#ok<AGROW>
        temp1=temp1+(h(k)-y(k));
        temp2=temp2+(h(k)-y(k))*X(k,2);
     end
     % Simultaneous Update
      tmp1=theta(1,1)-(alpha*1/(2*m)*temp1);
      tmp2=theta(2,1)-(alpha*(1/(2*m))*temp2);
      theta(1,1)=tmp1;
      theta(2,1)=tmp2;
      theta1_history(i)=theta(2,1); %#ok<AGROW>
      theta0_history(i)=theta(1,1); %#ok<AGROW>
      % Step 5 Calculate cost function
      tmp3=0;
      tmp4=0;
      for p=1:m
        tmp3=tmp3+theta(1,1)+theta(2,1)*X(p,1);
        tmp4=tmp4+theta(1,1)+theta(2,1)*X(p,2);
      end
      J1_theta0(i)=tmp3*(1/(2*m)); %#ok<AGROW>
      J2_theta1(i)=tmp4*(1/(2*m)); %#ok<AGROW>


      end
      theta
      hold on;
      plot(X(:,2),theta(1,1)+theta(2,1)*X);

我得到了

的价值

theta as 0.0373 和0.1900应该是0.0745和0.3800

这个值大约是我期待的两倍。

Answer 1

我一直在尝试用矩阵和向量来实现迭代步骤（即不更新theta的每个参数）。这是我想出的（这里只有渐变步骤）：

h = X * theta;  # hypothesis
err = h - y;    # error
gradient = alpha * (1 / m) * (X' * err); # update the gradient
theta = theta - gradient;

难以掌握的是前面例子的梯度步骤中的“和”实际上是由矩阵乘法X'*err执行的。您也可以将其写为(err'*X)'

Answer 2

我设法创建了一个使用Matlab支持的更多矢量化属性的算法。我的算法与你的算法略有不同，但是你提出的是梯度下降过程。在我执行的执行和验证（使用polyfit函数）之后，我认为在1500次迭代步骤之后，变量theta（0）= 0.0745和theta（1）= 0.3800中预期的openclassroom（练习2）中的值是错误的0.07（我不回应）。这就是为什么我用一个图中的数据绘制我的结果，而另一个图中的数据绘制了所需的结果，我发现数据拟合程序有很大差异。

首先看一下代码：

% Machine Learning : Linear Regression

clear all; close all; clc;

%% ======================= Plotting Training Data =======================
fprintf('Plotting Data ...\n')

x = load('ex2x.dat');
y = load('ex2y.dat');

% Plot Data
plot(x,y,'rx');
xlabel('X -> Input') % x-axis label
ylabel('Y -> Output') % y-axis label

%% =================== Initialize Linear regression parameters ===================
 m = length(y); % number of training examples

% initialize fitting parameters - all zeros
theta=zeros(2,1);%theta 0,1

% Some gradient descent settings
iterations = 1500;
Learning_step_a = 0.07; % step parameter

%% =================== Gradient descent ===================

fprintf('Running Gradient Descent ...\n')

%Compute Gradient descent

% Initialize Objective Function History
J_history = zeros(iterations, 1);

m = length(y); % number of training examples

% run gradient descent    
for iter = 1:iterations

   % In every iteration calculate hypothesis
   hypothesis=theta(1).*x+theta(2);

   % Update theta variables
   temp0=theta(1) - Learning_step_a * (1/m)* sum((hypothesis-y).* x);
   temp1=theta(2) - Learning_step_a * (1/m) *sum(hypothesis-y);

   theta(1)=temp0;
   theta(2)=temp1;

   % Save objective function 
   J_history(iter)=(1/2*m)*sum(( hypothesis-y ).^2);

end

% print theta to screen
fprintf('Theta found by gradient descent: %f %f\n',theta(1),  theta(2));
fprintf('Minimum of objective function is %f \n',J_history(iterations));

% Plot the linear fit
hold on; % keep previous plot visible 
plot(x, theta(1)*x+theta(2), '-')

% Validate with polyfit fnc
poly_theta = polyfit(x,y,1);
plot(x, poly_theta(1)*x+poly_theta(2), 'y--');
legend('Training data', 'Linear regression','Linear regression with polyfit')
hold off 

figure
% Plot Data
plot(x,y,'rx');
xlabel('X -> Input') % x-axis label
ylabel('Y -> Output') % y-axis label

hold on; % keep previous plot visible
% Validate with polyfit fnc
poly_theta = polyfit(x,y,1);
plot(x, poly_theta(1)*x+poly_theta(2), 'y--');

% for theta values that you are saying
theta(1)=0.0745;  theta(2)=0.3800;
plot(x, theta(1)*x+theta(2), 'g--')
legend('Training data', 'Linear regression with polyfit','Your thetas')
hold off

好的结果如下：

根据我的算法生成的theta（0）和theta（1），行符合数据。

Gradient descent - theta0=0.063883, theta1=0.750150

以theta（0）和theta（1）为固定值，行不适合数据。

Gradient descent - theta0=0.0745, theta1=0.3800

Answer 3

以下是一些评论：

max_iterations设置为1。通常运行梯度下降，直到目标函数的减少低于某个阈值或梯度的幅度低于某个阈值，这可能会超过一次迭代。
因子1 /（2 * m）在技术上不正确。这不应该导致算法失败，但会有效地降低学习速度。
您没有计算出正确的目标。正确的线性回归目标应该是平方残差平均值的一半，或者是残差平方和的一半。
您应该利用matlab的矢量化计算，而不是使用for循环。例如，res=X*theta-y; obj=.5/m*res'res;应计算残差（res）和线性回归目标（obj）。

Answer 4

你需要放 temp1 = 0 temp2 = 0 作为迭代循环中的第一个注释; 如果你不这样做，你当前的临时会影响下一次迭代，这是错误的

Answer 5

根据您期望的Ɵ（theta）的值和程序的结果，可以注意到预期值是结果的两倍。

您可能犯的错误是在衍生计算代码中使用1/(2*m)代替1/m。在导数中，分母的2消失，因为原始术语是（h _Ɵ（x） - y）² ，其中分化产生 2 *（h _Ɵ（x） - y）。 2s取消了。

修改这些代码行：

J1_theta0(i)=tmp3*(1/(2*m)); %#ok<AGROW>
J2_theta1(i)=tmp4*(1/(2*m)); %#ok<AGROW>

到

J1_theta0(i)=tmp3*(1/m); %#ok<AGROW>
J2_theta1(i)=tmp4*(1/m); %#ok<AGROW>

希望它有所帮助。

Gradient Descent Matlab实现

5 个答案: