如何在Matlab中拟合多峰对数正态分布?

时间:2019-05-10 11:27:09

标签: matlab distribution curve-fitting

我需要拟合表示粒度测量值的多峰分布。例如,这些测量结果可能如下所示:

Particle size

现在,我想拟合这些曲线。在this answer的帮助下,对于单峰分布函数,我可以获得相当不错的结果:

fun = @(p,x)(p(1)./x .* 1./(p(3)*sqrt(2*pi)).*exp(-(log(x)-p(2)).^2./(2*p(3)^2)));

通过像这样缩放结果参数:

[yM_in, pp_in] = max(DPF_in_mean);
xM_in = x_data(pp_in);
[yM_out, pp_out] = max(DPF_out_mean);
xM_out = x_data(pp_out);

xR_in = x_data / xM_in;
yR_in = DPF_in_mean / yM_in;
xR_out = x_data / xM_out;
yR_out = DPF_out_mean / yM_out;

opts = optimoptions('lsqcurvefit','TolX',1e-4,'TolFun',1e-8);
p0 = [1,1,1];
p_in = lsqcurvefit(fun,p0,xR_in,yR_in,[],[],opts);
p_out = lsqcurvefit(fun,p0,xR_out,yR_out,[],[],opts);

p_in_scaled = [ yM_in * p_in(1) * xM_in, p_in(2) + log(xM_in), p_in(3) ];
p_out_scaled = [ yM_out * p_out(1) * xM_out, p_out(2) + log(xM_out), p_out(3) ];

但是,如果我绘制结果拟合,则很明显单峰分布不足以拟合测量结果:

Fit with unimodal distribution

在有关multimodal distribution的Wikipedia文章中,看来我可以像这样融合第二个发行版:

fun = @(p,x)(p(7)*(p(1)./x .* 1./(p(3)*sqrt(2*pi)).*exp(-(log(x)-p(2)).^2./(2*p(3)^2))) + (1 - p(7))*(p(4)./x .* 1./(p(5)*sqrt(2*pi)).*exp(-(log(x)-p(6)).^2./(2*p(5)^2))));

但是我不知道如何在缩放中集成其他参数

p_in_scaled = [ yM_in * p_in(1) * xM_in, p_in(2) + log(xM_in), p_in(3) ];

因为我不太了解此扩展步骤中发生了什么。

如何使用多峰分布来拟合测量结果?

编辑

使用的数据如下:

x_data = [4.87000000000000e-09 5.62000000000000e-09 6.49000000000000e-09 7.50000000000000e-09 8.66000000000000e-09 ...
          1.00000000000000e-08 1.15500000000000e-08 1.33400000000000e-08 1.54000000000000e-08 1.77800000000000e-08 ...
          2.05400000000000e-08 2.37100000000000e-08 2.73800000000000e-08 3.16200000000000e-08 3.65200000000000e-08 ...
          4.21700000000000e-08 4.87000000000000e-08 5.62300000000000e-08 6.49400000000000e-08 7.49900000000000e-08 ...
          8.66000000000000e-08 1.00000000000000e-07 1.15480000000000e-07 1.33350000000000e-07 1.53990000000000e-07 ...
          1.77830000000000e-07 2.05350000000000e-07 2.37140000000000e-07 2.73840000000000e-07 3.16230000000000e-07 ...
          3.65170000000000e-07 4.21700000000000e-07 4.86970000000000e-07 5.62340000000000e-07 6.49380000000000e-07 ...
          7.49890000000000e-07 8.65960000000000e-07 1.00000000000000e-06];

DPF_in_mean = [188318640795.745 360952462222.222 750859638450.704 2226776878843.93 4845941940346.82 7979258430057.80 ...
               11010887350289.0 13462058712138.7 15090350247398.8 15991756383815.0 16680978441618.5 17862081914450.9 ...
               20071390890173.4 23460963364161.9 27630428508670.5 31777265780346.8 35520451433526.0 38587652184971.1 ...
               40516972485549.1 41326812092485.6 41127130682080.9 40038712485549.1 37976259664739.9 34725415132948.0 ...
               30177578265896.0 24546703179190.8 18400851109826.6 12500471611560.7 7540309609248.56 3912091102658.96 ...
               1632974141040.46 458500289086.705 126012891030.303 0 0 0 7276263267.44526 11203995842.0392];

DPF_out_mean = [444898373533.333 1032357396444.44 1675044380444.44 2316141430222.22 2852971589555.56 3151959865111.11 ...
                3134892475777.78 2828026308000.00 2325761940666.67 1745907627777.78 1192912799111.11 742253282222.222 ...
                430349362888.889 255820144555.556 188235813444.444 181970493622.222 204829338533.333 233009821977.778 ...
                243007623333.333 230736732777.778 202426609488.889 169758857200.000 140604138622.222 116482776222.222 ...
                95076737155.5556 74172071777.7778 53672033733.3333 35251323911.1111 20813708255.5556 11102006362.8889 ...
                5497173092.96089 2625918349.76536 1471042995.80373 1012939492.96541 751738952.194595 589422111.731818 ...
                479373451.936508 378359645.767442]; 

1 个答案:

答案 0 :(得分:1)

这是一种可能有用的可能性。由于数据中的大峰会遮盖较小的峰,因此减去较大的峰数据会留下较小的峰数据以供分析。如果您知道大峰的形式,则可以对数据进行拟合,然后在每个数据集中仅剩下两个双峰峰之一用于分析。一旦找到第二个峰的形式,就可以通过使用先前的分析拟合值作为最终参数的初始参数值拟合两个峰之和来重新开始。

我已经对两个数据集进行了方程搜索,以找到适合每个集合中主要峰的峰方程,这是我的结果。没有进行数据转换或预处理,我使用了发布的原始数据。

对于DPF_in_mean,我的主峰为:

in_plot

def Peak_LogNormalAShifted_model(x_in): # from zunzun.com using DPF_in_mean
    # coefficients
    a = 4.2518873952455000E+13
    b = -1.6088042345890798E+01
    c = 6.0084502637701809E-01
    d = -2.3091703726006359E-08

    return a * numpy.exp(-0.5 * numpy.power((numpy.log(x_in-d)-b) / c, 2.0))

对于DPF_out_mean,我的主峰为:

def Peak_LogNormalA_model(x_in): # from zunzun.com using DPF_out_mean
    # coefficients
    a = 3.1863877879345913E+12
    b = -1.8334716040160675E+01
    c = 4.4913908739937525E-01

    return a * numpy.exp(-0.5 * numpy.power((numpy.log(x_in)-b) / c, 2.0))

out_plot