Question

在MATLAB中有没有办法检查直方图分布是单峰还是双峰？

修改

您认为Hartigan's Dip Statistic会起作用吗？我尝试将图像传递给它，并获得值0。这是什么意思？

并且，当传递图像时，它是否测试灰度级图像直方图的分布？

感谢。

Answer 1

这是一个使用Nic Price实施Hartigan's Dip测试来识别单峰分布的脚本。棘手的一点是计算xpdf，这不是概率密度函数，而是一个有序的样本。

p_value是获得测试统计量的概率，至少与实际观察到的一样极端，假设零假设为真。在这种情况下，零假设是分布是单峰的。

close all; clear all;

function [x2, n, b] = compute_xpdf(x)
  x2 = reshape(x, 1, prod(size(x)));
  [n, b] = hist(x2, 40);
  % This is definitely not probability density function
  x2 = sort(x2);
  % downsampling to speed up computations
  x2 = interp1 (1:length(x2), x2, 1:1000:length(x2));
end

nboot = 500;
sample_size = [256 256];

% Unimodal
sample2d = normrnd(0.0, 10.0, sample_size);

[xpdf, n, b] = compute_xpdf(sample2d);
[dip, p_value, xlow, xup] = HartigansDipSignifTest(xpdf, nboot); 

figure;
subplot(1,2,1);
bar(n, b)
title(sprintf('Probability of unimodal %.2f', p_value))

% Bimodal
sample2d = sign(sample2d) .* (abs(sample2d) .^ 0.5);

[xpdf, n, b] = compute_xpdf(sample2d);
[dip, p_value, xlow, xup] = HartigansDipSignifTest(xpdf, nboot); 

subplot(1,2,2);
bar(n, b)
title(sprintf('Probability of unimodal %.2f', p_value))

print -dpng modality.png

Result of script execution

Answer 2

有许多不同的方法可以满足您的要求。从字面意义上讲，“双峰”意味着有两个峰。通常，您希望将“两个峰值”分开一些合理的距离，并且您希望它们各自包含总计数的合理比例。只有你知道什么是“合理”的情况，但以下方法可能会有所帮助。

创建强度直方图
使用cumsum
对于分布之间“切割”的不同值（25％，30％，50％，...），计算两个分布的平均值和标准差（切割上方和下方）。
计算平均值之间的距离除以两个分布的标准偏差之和
该数量将是“最佳削减”的最大值

您必须决定该数量的大小代表“双峰”。这是一些代码，演示了我在说什么。它产生不同严重程度的双峰分布 - 两个高斯，它们之间的增量增加（步长=标准偏差的大小）。我计算了上面描述的数量，并将其绘制成一系列不同的delta值。然后我通过该曲线拟合一条抛物线，其范围对应于整个分布的±1西格玛。正如您所看到的，当分布变得更加双峰时，会发生两件事：

此曲线的曲率翻转（从山谷到峰值）
最大值增加（高斯值约为1.33）。

您可以查看一些自己的发行版的这些数量，并确定您希望将截止值放在何处。

% test for bimodal distribution
close all
for delta = 0:10:50
    a1 = randn(100,100) * 10 + 25;
    a2 = randn(100,100) * 10 + 25 + delta;
    a3 = [a1(:); a2(:)];
    [h hb] = hist(a3, 0:100);
    cs = cumsum(h);
    llimi = find(cs < 0.2 * max(cs(:)));
    ulimi = find(cs > 0.8 * max(cs(:)));
    llim = hb(llimi(end));
    ulim = hb(ulimi(1));
    cuts = linspace(llim, ulim, 20);
    dmean = mean(a3);
    dstd = std(a3);
    for ci = 1:numel(cuts)
        d1 = a3(a3<cuts(ci));
        d2 = a3(a3>=cuts(ci));
        m(ci,1) = mean(d1);
        m(ci, 2) = mean(d2);
        s(ci, 1) = std(d1);
        s(ci, 2) = std(d2);
    end
    q = (m(:, 2) - m(:, 1)) ./ sum(s, 2);
    figure; 
    plot(cuts, q);
    title(sprintf('delta = %d', delta))
    % compute curvature of plot around mean:
    xlims = dmean + [-1 1] * dstd;
    indx = find(cuts < xlims(2) && cuts > xlims(1));
    pf = polyfit(cuts(indx), q(indx), 2);
    m = polyval(pf, dmean);
    fprintf(1, 'coefficients: a = %.2e, peak = %.2f\n', pf(1), m);
end

输出值：

coefficients: a = 1.37e-03, peak = 1.32
coefficients: a = 1.01e-03, peak = 1.34
coefficients: a = 2.85e-04, peak = 1.45
coefficients: a = -5.78e-04, peak = 1.70
coefficients: a = -1.29e-03, peak = 2.08
coefficients: a = -1.58e-03, peak = 2.48

示例图：

delta = 0

delta = 4 sigma

delta = 40的直方图：

enter image description here

MATLAB中的单峰或双峰分布

2 个答案: