高斯过程二元分类:在概率尺度上计算预测的输出方差

时间:2018-08-07 15:21:25

标签: matlab classification binary-data

我正在尝试使用MATLAB的gpml工具箱(http://www.gaussianprocess.org/gpml/code/matlab/doc/)进行分类,我希望在最终的预测概率周围有置信带。我很难实现这一点,因为在线示例(以及在Github之类的地方可以找到的示例)仅具有围绕潜在函数和/或预测输出均值的置信区间。但是,对于二进制分类,必须首先将预测输出均值转换为概率。使用MATLAB 2013a,我可以看到以下内容:

%-------------------------------
% Create data from test cases
n = 30;
x = 10 * lhsdesign(n, 1);
prob_fun = @(x) 0.75 * normcdf(-x,-1.75,0.4) + 0.5 * normpdf(x,4.5,1) + 1.75*normpdf(x,7.5,0.75);
prob = prob_fun(x); 
y = binornd(1, prob, n, 1);
test_cases = linspace(min(x), max(x), 500)';

% Convert to -1/1 for gp code. Also see what true function looks like. 
y(y < 1) = -1;
true_probability = prob_fun(test_cases); 
% plot(xs, truth, 'k-')


%-------------------------------
% Set mean to be constant. Put in terms of logit
meanF = {@meanConst};
meanY = mean(0.5 * (y + 1));
meanY = log(meanY / (1 - meanY));
hyp0.mean = meanY;

% Gaussian correlation (covariance) function. Just manually setting the
% length and scale parameters for now
covfunc = @covSEiso;
hyp0.cov  = [0.75; 2.5];
likfunc = @likLogistic;

% Run GP model and make predictions on test cases
[ymus, ys2s, fmus, fs2s, ~, post] = gp(hyp0, @infEP, meanF, covfunc,...
    likfunc, x, y, test_cases);


%-------------------------------
% Turn the probability values into valid probabilities:
ymus_prob = (ymus + 1) * 0.5;    

% THIS IS WHERE I'M STUCK...IS THIS CORRECT?
ys2s_lower_prob =  normcdf(ymus + 1.96* sqrt(ys2s));
ys2s_upper_prob =  normcdf(ymus - 1.96* sqrt(ys2s)); 

% Alternative approach?
% ys2s_lower_prob =  exp(ymus + 1.96* sqrt(ys2s)) ./...
%     (1 + exp(ymus + 1.96* sqrt(ys2s)));
% ys2s_upper_prob =  exp(ymus - 1.96* sqrt(ys2s)) ./...
%     (1 + exp(ymus - 1.96* sqrt(ys2s)));  

% Realizations converted
y_01 = y;
y_01(y_01 < 0) = 0;


%-------------------------------
% Plotting
figure()
subplot(1, 2, 1);
plot(x, y, 'ko'); hold on;                                      % realizations
f = [ymus + 1.96*sqrt(ys2s);...
flipdim(ymus - 1.96*sqrt(ys2s), 1)];
fill([test_cases; flipdim(test_cases,1)], f, [7 7 7]/8);        % confidence region
plot(x, y, 'ko'); hold on;                                      % realizations
plot(test_cases, true_probability, 'k--'); hold on;             % true function
plot(test_cases, ymus, 'r-'); hold on;                          % predicted function 
title('Predicted Values: Not Transformed')

subplot(1, 2, 2);
plot(x, y_01, 'ko'); hold on;                                   % realizations
f = [ys2s_lower_prob;...
flipdim(ys2s_upper_prob, 1)];
fill([test_cases; flipdim(test_cases,1)], f, [7 7 7]/8);        % confidence region
plot(x, y_01, 'ko'); hold on;                                   % realizations
plot(test_cases, true_probability, 'k--'); hold on;             % true function
plot(test_cases, ymus_prob, 'r-'); hold on;                     % predicted function 
title('Predicted Values: Transformed to 0-1 Scale')

您可以看到,我一直在努力弄清楚如何处理ys2s以及如何使它成为“概率术语”。我以为应该尝试逆向logit变换,但是使用normcdf可以得到更好(更严格)的结果。它在生成的图中生成图: enter image description here

有人可以就如何在概率标度上生成预测方差提供一些指导吗?我认为我在这里做得正确,虽然我了解到置信带可能不是对称的,但它们甚至在某些地方甚至都不包含均值。

如果有任何不同,我在Windows 10计算机上。另外,我很乐意为此使用R,但似乎找不到能提供预测输出均值/方差的任何程序包来提供预测潜在均值/方差。谢谢!

0 个答案:

没有答案