用于正则Logistic回归时的梯度下降超调和成本爆炸

时间:2019-04-12 03:22:37

标签: matlab machine-learning logistic-regression gradient-descent regularized

我正在使用MATLAB编写 Regularized Logistic回归,并正在使用 Gradient Descent 查找参数。全部基于Andrew Ng的Coursera机器学习课程。我正在尝试从安德鲁的笔记/视频中编写成本函数。我不确定自己是否做对了。

主要问题是...如果迭代次数太大,我的成本似乎会飙升。无论我是否进行归一化,都会发生这种情况(将所有数据转换为介于0和1之间的数据)。此问题还导致生成的决策边界缩小(欠拟合?)。下面是获得的三个示例结果,其中将GD的决策边界与Matlab的fminunc的决策边界进行了比较。

Regularized GD vs. fminunc, along with the cost history (30K iterations) Regularized GD vs. fminunc, along with the cost history (70K iterations) Regularized GD vs. fminunc, along with the cost history (120K iterations)

可以看出,随着迭代次数的增加,成本急剧上升。难道是我对费用进行了错误编码?还是梯度下降确实有可能超调?如果有帮助,我正在提供我的代码。我用来计算成本历史记录的代码是: costHistory(i) = (-1 * ( (1/m) * y'*log(h_x) + (1-y)'*log(1-h_x))) + ( (lambda/(2*m)) * sum(theta(2:end).^2) );,基于以下等式:

Regularized Cost Function for Logistic Regression

完整代码如下。请注意,我在此代码中还调用了其他函数。将不胜感激任何指针! :)预先谢谢您!

% REGULARIZED Logistic Regression with Gradient Descent
clc; clear all; close all;
dataset = load('ex2data2.txt');
x = dataset(:,1:end-1); y = dataset(:,end); m = length(y);

% Mapping the features (includes adding the intercept term)
x = mapFeature(x(:,1), x(:,2)); % Change to polynomial of the 6th degree

% Define the initial thetas. Same as the number of features, including
% the newly added intercept term (1s)
theta = zeros(size(x,2),1) + 0.05;
initial_theta = theta; % will be used later...

% Set lambda equals to 1
lambda = 1;
% calculate theta transpose x and also the hypothesis h_x
alpha = 0.005;
itr = 120000; % number of iterations set to 120K
for i = 1:itr
    ttrx = x * theta; % theta transpose x
    h_x = 1 ./ (1 + exp(-ttrx)); % sigmoid hypothesis
    error = h_x - y;
    % the gradient a.k.a. the derivative of J(\theta)
    for j = 1:length(theta)
        if j == 1
            gradientA(j,1) = 1/m * (error)' * x(:,j);
            theta(j) =  theta(j) - alpha * gradientA(j,1);
        else
            gradientA(j,1) = (1/m * (error)' * x(:,j)) - (lambda/m)*theta(j);
            theta(j) =  theta(j) - alpha * gradientA(j,1);
        end
    end

    costHistory(i) = (-1 * ( (1/m) * y'*log(h_x) + (1-y)'*log(1-h_x))) + ( (lambda/(2*m)) * sum(theta(2:end).^2) );
end

[cost, grad] = costFunctionReg(initial_theta, x, y, lambda);
%  Using MATLAB's built-in function fminunc to minimze the cost function
%  Set options for fminunc
options = optimset('GradObj', 'on', 'MaxIter', 500);
%  Run fminunc to obtain the optimal theta
%  This function will return theta and the cost
[thetafm, cost] = fminunc(@(t)(costFunctionReg(t, x, y, lambda)), initial_theta, options);

close all;
plotDecisionBoundary_git(theta, x, y); % based on GD
plotDecisionBoundary_git(thetafm, x, y); % based on fminunc

figure;
plot(1:itr, costHistory(:), '--r');
title('The cost history based on GD');

0 个答案:

没有答案