我正在使用MATLAB编写 Regularized Logistic回归,并正在使用 Gradient Descent 查找参数。全部基于Andrew Ng的Coursera机器学习课程。我正在尝试从安德鲁的笔记/视频中编写成本函数。我不确定自己是否做对了。
主要问题是...如果迭代次数太大,我的成本似乎会飙升。无论我是否进行归一化,都会发生这种情况(将所有数据转换为介于0和1之间的数据)。此问题还导致生成的决策边界缩小(欠拟合?)。下面是获得的三个示例结果,其中将GD的决策边界与Matlab的fminunc
的决策边界进行了比较。
可以看出,随着迭代次数的增加,成本急剧上升。难道是我对费用进行了错误编码?还是梯度下降确实有可能超调?如果有帮助,我正在提供我的代码。我用来计算成本历史记录的代码是:
costHistory(i) = (-1 * ( (1/m) * y'*log(h_x) + (1-y)'*log(1-h_x))) + ( (lambda/(2*m)) * sum(theta(2:end).^2) );
,基于以下等式:
完整代码如下。请注意,我在此代码中还调用了其他函数。将不胜感激任何指针! :)预先谢谢您!
% REGULARIZED Logistic Regression with Gradient Descent
clc; clear all; close all;
dataset = load('ex2data2.txt');
x = dataset(:,1:end-1); y = dataset(:,end); m = length(y);
% Mapping the features (includes adding the intercept term)
x = mapFeature(x(:,1), x(:,2)); % Change to polynomial of the 6th degree
% Define the initial thetas. Same as the number of features, including
% the newly added intercept term (1s)
theta = zeros(size(x,2),1) + 0.05;
initial_theta = theta; % will be used later...
% Set lambda equals to 1
lambda = 1;
% calculate theta transpose x and also the hypothesis h_x
alpha = 0.005;
itr = 120000; % number of iterations set to 120K
for i = 1:itr
ttrx = x * theta; % theta transpose x
h_x = 1 ./ (1 + exp(-ttrx)); % sigmoid hypothesis
error = h_x - y;
% the gradient a.k.a. the derivative of J(\theta)
for j = 1:length(theta)
if j == 1
gradientA(j,1) = 1/m * (error)' * x(:,j);
theta(j) = theta(j) - alpha * gradientA(j,1);
else
gradientA(j,1) = (1/m * (error)' * x(:,j)) - (lambda/m)*theta(j);
theta(j) = theta(j) - alpha * gradientA(j,1);
end
end
costHistory(i) = (-1 * ( (1/m) * y'*log(h_x) + (1-y)'*log(1-h_x))) + ( (lambda/(2*m)) * sum(theta(2:end).^2) );
end
[cost, grad] = costFunctionReg(initial_theta, x, y, lambda);
% Using MATLAB's built-in function fminunc to minimze the cost function
% Set options for fminunc
options = optimset('GradObj', 'on', 'MaxIter', 500);
% Run fminunc to obtain the optimal theta
% This function will return theta and the cost
[thetafm, cost] = fminunc(@(t)(costFunctionReg(t, x, y, lambda)), initial_theta, options);
close all;
plotDecisionBoundary_git(theta, x, y); % based on GD
plotDecisionBoundary_git(thetafm, x, y); % based on fminunc
figure;
plot(1:itr, costHistory(:), '--r');
title('The cost history based on GD');