Question

我正在学习机器学习课程。机器学习对我来说是一个很好的领域。在第一次编程练习中，我在梯度体面算法中遇到了一些困难。如果有人可以帮助我，我将不胜感激。

以下是更新thetas的说明;

“你将在文件gradientDescent.m中实现梯度下降。已经为你编写了循环结构，你只需要在每次迭代中提供θ的更新。

    function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
    %GRADIENTDESCENT Performs gradient descent to learn theta
    %   theta = GRADIENTDESENT(X, y, theta, alpha, num_iters) updates theta by 
    %   taking num_iters gradient steps with learning rate alpha

   % Initialize some useful values
   m = length(y); % number of training examples
   J_history = zeros(num_iters, 1);

   for iter = 1:num_iters

% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
%               theta. 
%
% Hint: While debugging, it can be useful to print out the values
%       of the cost function (computeCost) and gradient here.
%
    % ============================================================

% Save the cost J in every iteration    
J_history(iter) = computeCost(X, y, theta);

end

end

所以我这样做是为了同时更新这些内容;

    temp0 = theta(1,1) - (alpha/m)*sum((X*theta-y));
    temp1 = theta(2,1) - (alpha/m)*sum((X*theta-y).*X);
    theta(1,1) = temp0;
    theta(2,1) = temp1;

运行此代码时出现错误。有人可以帮帮我吗？

Answer 1

我已解释为什么您可以使用矢量化形式：

theta = theta - (alpha/m) * (X' * (X * theta - y));或等效的

theta = theta - (alpha/m) * ((X * theta - y)' * X)';

<{3>}中的

。

在下面引用它：

梯度下降算法的矩阵版本说明：

这是用于微调θ值的梯度下降算法： this answer

假设给出以下X，y和θ值：

m =培训示例数
n =要素数量+ 1

下面

m = 5（训练样例）
n = 4（功能+ 1）
X = m x n矩阵
y = m x 1向量矩阵
θ= n x 1向量矩阵
x ⁱ是i ^th培训示例
x _j是给定训练示例中的j ^th特征

此外，

h(x) = ([X] * [θ])（m x 1我们训练集的预测值矩阵）
h(x)-y = ([X] * [θ] - [y])（我们的预测中m x 1个错误矩阵）

机器学习的整个目标是最小化预测中的错误。基于上述推论，我们的错误矩阵是m x 1向量矩阵如下：

要计算θ_j的新值，我们必须得到所有误差（m行）乘以训练的j ^th特征值的总和设置X。也就是说，取E中的所有值，分别将它们与相应的训练示例的j ^th特征相乘，然后将它们全部加在一起。这将有助于我们获得θ_j的新值（并且希望更好）。对所有j或特征数重复此过程。在矩阵形式中，这可以写成：

这可以简化为：

[E]' x [X]将给出一个行向量矩阵，因为E'是1 x m矩阵，X是m x n矩阵。但我们对获得列矩阵很感兴趣，因此我们将转换结果矩阵。

更简洁，它可以写成：

同样的结果也可以写成：

Answer 2

theta = theta - (alpha/m) * (X' * (X * theta - y));

这是正确的答案

Answer 3

temp0 = theta(1,1) - (alpha/m)*sum((X*theta-y));
temp1 = theta(2,1) - (alpha/m)*sum((X*theta-y).*X(:,2));
theta(1,1) = temp0;
theta(2,1) = temp1;

或者您可以使用以下代码。它更简单。只有两个参数theta1和theta2。但如果存在更多参数，那就更好了。

for i=1:2
    theta(i) = theta(i) - (alpha/m)*sum((X*theta-y).*X(:,i));
end

Answer 4

您收到的错误Error using .* Matrix dimensions must agree. Error in gradientDescent (line 20) temp1 = theta(2,1) - (alpha/m)*sum((X*theta-y).*X);表示.*无效。因此，在该行之前，请添加以下代码：

size(X*theta-y)
size(X)

如果您想(X*theta-y).*X，X*theta-y和X的大小应相同。如果他们不是，您将需要检查您的算法。

Answer 5

在这个问题中有一点需要注意：

X = [ones(m, 1), data(:,1)];

所以

theta = theta - (alpha / m) * (X' * (X * theta - y));

和

temp0 = theta(1, 1) - (alpha / m) * sum((X * theta - y));
temp1 = theta(2, 1) - (alpha / m) * sum((X * theta - y) .* X(:, 2));
theta(1, 1) = temp0;
theta(2, 1) = temp1;

两者都是对的

Matlab中的梯度下降

5 个答案:

在下面引用它：