我已经尝试实现梯度下降算法的迭代版本,但是该算法不能正常工作。但是,相同算法的矢量化实现可以正常工作。
这是迭代的实现:
function [theta] = gradientDescent_i(X, y, theta, alpha, iterations)
% get the number of rows and columns
nrows = size(X, 1);
ncols = size(X, 2);
% initialize the hypothesis vector
h = zeros(nrows, 1);
% initialize the temporary theta vector
theta_temp = zeros(ncols, 1);
% run gradient descent for the specified number of iterations
count = 1;
while count <= iterations
% calculate the hypothesis values and fill into the vector
for i = 1 : nrows
for j = 1 : ncols
term = theta(j) * X(i, j);
h(i) = h(i) + term;
end
end
% calculate the gradient
for j = 1 : ncols
for i = 1 : nrows
term = (h(i) - y(i)) * X(i, j);
theta_temp(j) = theta_temp(j) + term;
end
end
% update the gradient with the factor
fact = alpha / nrows;
for i = 1 : ncols
theta_temp(i) = fact * theta_temp(i);
end
% update the theta
for i = 1 : ncols
theta(i) = theta(i) - theta_temp(i);
end
% update the count
count += 1;
end
end
以下是同一算法的矢量化实现:
function [theta, theta_all, J_cost] = gradientDescent(X, y, theta, alpha)
% set the learning rate
learn_rate = alpha;
% set the number of iterations
n = 1500;
% number of training examples
m = length(y);
% initialize the theta_new vector
l = length(theta);
theta_new = zeros(l,1);
% initialize the cost vector
J_cost = zeros(n,1);
% initialize the vector to store all the calculated theta values
theta_all = zeros(n,2);
% perform gradient descent for the specified number of iterations
for i = 1 : n
% calculate the hypothesis
hypothesis = X * theta;
% calculate the error
err = hypothesis - y;
% calculate the gradient
grad = X' * err;
% calculate the new theta
theta_new = (learn_rate/m) .* grad;
% update the old theta
theta = theta - theta_new;
% update the cost
J_cost(i) = computeCost(X, y, theta);
% store the calculated theta value
if i < n
index = i + 1;
theta_all(index,:) = theta';
end
end
可以找到here
到数据集的链接文件名是ex1data1.txt
问题
对于初始theta = [0,0](这是一个矢量!),学习率为0.01并对其进行1500次迭代,我得到的最佳theta为:
上面是向量化实现的输出,我知道我已经正确实现了(它通过了Coursera上的所有测试用例)。
但是,当我使用迭代方法(提到的第一个代码)实现相同的算法时,得到的theta值是(alpha = 0.01,迭代次数= 1500):
此实现无法通过测试用例,因此我知道该实现不正确。
但是,我无法理解错误之处,因为迭代代码完成了相同的工作,与矢量化代码相同的乘法,并且当我尝试跟踪两个代码的1次迭代的输出时,值都相同(在纸和笔上!),但是当我在Octave上运行它们时失败了。
任何与此有关的帮助都会有很大帮助,特别是如果您可以指出我出了问题的地方以及失败的原因到底是什么。
要考虑的点
此外,这是用于预处理数据的代码:
function[X, y] = fileReader(filename)
% load the dataset
dataset = load(filename);
% get the dimensions of the dataset
nrows = size(dataset, 1);
ncols = size(dataset, 2);
% generate the X matrix from the dataset
X = dataset(:, 1 : ncols - 1);
% generate the y vector
y = dataset(:, ncols);
% append 1's to the X matrix
X = [ones(nrows, 1), X];
end
答案 0 :(得分:1)
第一个代码的问题是theta_temp
和h
向量没有正确初始化。对于第一次迭代(当count
等于1时),您的代码将正确运行,因为对于该特定迭代,h
和theta_temp
向量已正确初始化为0。但是,由于这些是梯度下降每次迭代的临时矢量,因此尚未为后续迭代将它们再次初始化为0矢量。也就是说,对于迭代2,仅将修改为h(i)
和theta_temp(i)
的值添加到旧值。因此,因此,代码无法正常工作。您需要在每次迭代开始时将向量更新为零向量,然后它们才能正常工作。这是我对您的代码的实现(第一个,观察更改):
function [theta] = gradientDescent_i(X, y, theta, alpha, iterations)
% get the number of rows and columns
nrows = size(X, 1);
ncols = size(X, 2);
% run gradient descent for the specified number of iterations
count = 1;
while count <= iterations
% initialize the hypothesis vector
h = zeros(nrows, 1);
% initialize the temporary theta vector
theta_temp = zeros(ncols, 1);
% calculate the hypothesis values and fill into the vector
for i = 1 : nrows
for j = 1 : ncols
term = theta(j) * X(i, j);
h(i) = h(i) + term;
end
end
% calculate the gradient
for j = 1 : ncols
for i = 1 : nrows
term = (h(i) - y(i)) * X(i, j);
theta_temp(j) = theta_temp(j) + term;
end
end
% update the gradient with the factor
fact = alpha / nrows;
for i = 1 : ncols
theta_temp(i) = fact * theta_temp(i);
end
% update the theta
for i = 1 : ncols
theta(i) = theta(i) - theta_temp(i);
end
% update the count
count += 1;
end
end
我运行了代码,它给出了与您提到的相同的theta值。但是,我想知道的是,您如何陈述假设向量的输出在两种情况下都是相同的,这显然是第一个代码失败的原因之一!