Question

我有一个由大约200万个样本组成的数据向量，我怀疑它是两个高斯的混合。我尝试使用matlab的fitgmdist将数据Data混合到混合物中。

从直方图：

% histogram counts of X with 1000 bins.
[Yhist, x] = histcounts(Data, 1000, 'normalization', 'pdf');
x = (x(1:end-1) + x(2:end))/2;

使用fitgmdist：

% Increase no. of iterations. default is 100.
opts.MaxIter = 300;

% Ensure that it does all iterations.
opts.TolFun = 0;

GMModel = fitgmdist(Data, 2, 'Options', opts, 'Start', 'plus');
wts = GMModel.ComponentProportion;
mu = GMModel.mu;
sig = sqrt(squeeze(GMModel.Sigma));
Ygmfit = wts(1)*normpdf(x(:), mu(1), sig(1)) + wts(2)*normpdf(x(:), mu(2), sig(2));

与fitgmdist的混合结果： wts = [0.6780,0.322]，mu = [-7.6444，-9.7831]，sig = [0.8243,0.5947]

接下来我尝试使用nlinfit：

% Define the callback function for nlinfit.
function y = nmmix(a, x)
   a(1:2) = a(1:2)/sum(a);
   y = a(1)*normpdf(x(:), a(3), a(5)) + a(2)*normpdf(x(:), a(4), a(6));
end

init_wts = [0.66, 1-0.66];
init_mu = [-7.7, -9.75];
init_sig = [0.5, 0.5];
a = nlinfit(x(:), Yhist(:), @nmmix, [init_wts, init_mu, init_sig]);
wts = a(1:2)/sum(a(1:2));
mu = a(3:4);
sig = a(5:6);
Ynlinfit = wts(1)*normpdf(x(:), mu(1), sig(1)) + wts(2)*normpdf(x(:), mu(2), sig(2));

混合结果与nlinfit： wts = [0.6349,0.3651]，mu = [-7.6305，-9.6991]，sig = [0.6773,0.6031]

% Plot to compare the results
figure;
hold on
plot(x(:), Yhist, 'b');
plot(x(:), Ygmfit, 'k');
plot(x(:), Ynlinfit, 'r');

似乎非线性拟合（红色曲线）直观地比直方图（蓝色曲线）更好地近似于＃34; fitgmdist＆＃34; （黑色曲线）。即使我使用更精细的直方图，结果也是相似的，比如使用100,000个箱子。

这种差异的根源是什么？

后来添加：当然人们不会期望结果是相同的，但我希望两种拟合的视觉质量具有可比性。

nlinfit看起来比fitgmdist更适合正常混合物

0 个答案: