Question

我正在使用MATLAB编写 Regularized Logistic回归，并正在使用 Gradient Descent 查找参数。全部基于Andrew Ng的Coursera机器学习课程。我正在尝试从安德鲁的笔记/视频中编写成本函数。我不确定自己是否做对了。

主要问题是...如果迭代次数太大，我的成本似乎会飙升。无论我是否进行归一化，都会发生这种情况（将所有数据转换为介于0和1之间的数据）。此问题还导致生成的决策边界缩小（欠拟合？）。下面是获得的三个示例结果，其中将GD的决策边界与Matlab的fminunc的决策边界进行了比较。

可以看出，随着迭代次数的增加，成本急剧上升。难道是我对费用进行了错误编码？还是梯度下降确实有可能超调？如果有帮助，我正在提供我的代码。我用来计算成本历史记录的代码是： costHistory(i) = (-1 * ( (1/m) * y'*log(h_x) + (1-y)'*log(1-h_x))) + ( (lambda/(2*m)) * sum(theta(2:end).^2) );，基于以下等式：

完整代码如下。请注意，我在此代码中还调用了其他函数。将不胜感激任何指针！ :)预先谢谢您！

% REGULARIZED Logistic Regression with Gradient Descent
clc; clear all; close all;
dataset = load('ex2data2.txt');
x = dataset(:,1:end-1); y = dataset(:,end); m = length(y);

% Mapping the features (includes adding the intercept term)
x = mapFeature(x(:,1), x(:,2)); % Change to polynomial of the 6th degree

% Define the initial thetas. Same as the number of features, including
% the newly added intercept term (1s)
theta = zeros(size(x,2),1) + 0.05;
initial_theta = theta; % will be used later...

% Set lambda equals to 1
lambda = 1;
% calculate theta transpose x and also the hypothesis h_x
alpha = 0.005;
itr = 120000; % number of iterations set to 120K
for i = 1:itr
    ttrx = x * theta; % theta transpose x
    h_x = 1 ./ (1 + exp(-ttrx)); % sigmoid hypothesis
    error = h_x - y;
    % the gradient a.k.a. the derivative of J(\theta)
    for j = 1:length(theta)
        if j == 1
            gradientA(j,1) = 1/m * (error)' * x(:,j);
            theta(j) =  theta(j) - alpha * gradientA(j,1);
        else
            gradientA(j,1) = (1/m * (error)' * x(:,j)) - (lambda/m)*theta(j);
            theta(j) =  theta(j) - alpha * gradientA(j,1);
        end
    end

    costHistory(i) = (-1 * ( (1/m) * y'*log(h_x) + (1-y)'*log(1-h_x))) + ( (lambda/(2*m)) * sum(theta(2:end).^2) );
end

[cost, grad] = costFunctionReg(initial_theta, x, y, lambda);
%  Using MATLAB's built-in function fminunc to minimze the cost function
%  Set options for fminunc
options = optimset('GradObj', 'on', 'MaxIter', 500);
%  Run fminunc to obtain the optimal theta
%  This function will return theta and the cost
[thetafm, cost] = fminunc(@(t)(costFunctionReg(t, x, y, lambda)), initial_theta, options);

close all;
plotDecisionBoundary_git(theta, x, y); % based on GD
plotDecisionBoundary_git(thetafm, x, y); % based on fminunc

figure;
plot(1:itr, costHistory(:), '--r');
title('The cost history based on GD');

用于正则Logistic回归时的梯度下降超调和成本爆炸

0 个答案: