我正在尝试使用MATLAB的gpml
工具箱(http://www.gaussianprocess.org/gpml/code/matlab/doc/)进行分类,我希望在最终的预测概率周围有置信带。我很难实现这一点,因为在线示例(以及在Github之类的地方可以找到的示例)仅具有围绕潜在函数和/或预测输出均值的置信区间。但是,对于二进制分类,必须首先将预测输出均值转换为概率。使用MATLAB 2013a,我可以看到以下内容:
%-------------------------------
% Create data from test cases
n = 30;
x = 10 * lhsdesign(n, 1);
prob_fun = @(x) 0.75 * normcdf(-x,-1.75,0.4) + 0.5 * normpdf(x,4.5,1) + 1.75*normpdf(x,7.5,0.75);
prob = prob_fun(x);
y = binornd(1, prob, n, 1);
test_cases = linspace(min(x), max(x), 500)';
% Convert to -1/1 for gp code. Also see what true function looks like.
y(y < 1) = -1;
true_probability = prob_fun(test_cases);
% plot(xs, truth, 'k-')
%-------------------------------
% Set mean to be constant. Put in terms of logit
meanF = {@meanConst};
meanY = mean(0.5 * (y + 1));
meanY = log(meanY / (1 - meanY));
hyp0.mean = meanY;
% Gaussian correlation (covariance) function. Just manually setting the
% length and scale parameters for now
covfunc = @covSEiso;
hyp0.cov = [0.75; 2.5];
likfunc = @likLogistic;
% Run GP model and make predictions on test cases
[ymus, ys2s, fmus, fs2s, ~, post] = gp(hyp0, @infEP, meanF, covfunc,...
likfunc, x, y, test_cases);
%-------------------------------
% Turn the probability values into valid probabilities:
ymus_prob = (ymus + 1) * 0.5;
% THIS IS WHERE I'M STUCK...IS THIS CORRECT?
ys2s_lower_prob = normcdf(ymus + 1.96* sqrt(ys2s));
ys2s_upper_prob = normcdf(ymus - 1.96* sqrt(ys2s));
% Alternative approach?
% ys2s_lower_prob = exp(ymus + 1.96* sqrt(ys2s)) ./...
% (1 + exp(ymus + 1.96* sqrt(ys2s)));
% ys2s_upper_prob = exp(ymus - 1.96* sqrt(ys2s)) ./...
% (1 + exp(ymus - 1.96* sqrt(ys2s)));
% Realizations converted
y_01 = y;
y_01(y_01 < 0) = 0;
%-------------------------------
% Plotting
figure()
subplot(1, 2, 1);
plot(x, y, 'ko'); hold on; % realizations
f = [ymus + 1.96*sqrt(ys2s);...
flipdim(ymus - 1.96*sqrt(ys2s), 1)];
fill([test_cases; flipdim(test_cases,1)], f, [7 7 7]/8); % confidence region
plot(x, y, 'ko'); hold on; % realizations
plot(test_cases, true_probability, 'k--'); hold on; % true function
plot(test_cases, ymus, 'r-'); hold on; % predicted function
title('Predicted Values: Not Transformed')
subplot(1, 2, 2);
plot(x, y_01, 'ko'); hold on; % realizations
f = [ys2s_lower_prob;...
flipdim(ys2s_upper_prob, 1)];
fill([test_cases; flipdim(test_cases,1)], f, [7 7 7]/8); % confidence region
plot(x, y_01, 'ko'); hold on; % realizations
plot(test_cases, true_probability, 'k--'); hold on; % true function
plot(test_cases, ymus_prob, 'r-'); hold on; % predicted function
title('Predicted Values: Transformed to 0-1 Scale')
您可以看到,我一直在努力弄清楚如何处理ys2s
以及如何使它成为“概率术语”。我以为应该尝试逆向logit变换,但是使用normcdf可以得到更好(更严格)的结果。它在生成的图中生成图:
有人可以就如何在概率标度上生成预测方差提供一些指导吗?我认为我在这里做得正确,虽然我了解到置信带可能不是对称的,但它们甚至在某些地方甚至都不包含均值。
如果有任何不同,我在Windows 10计算机上。另外,我很乐意为此使用R
,但似乎找不到能提供预测输出均值/方差的任何程序包来提供预测潜在均值/方差。谢谢!