峰值拟合与高斯混合模型(Scikit);如何从离散的pdf中提取样本?

时间:2015-06-10 14:39:35

标签: python scipy scikit-learn

据我所知,scikit中的高斯混合模型学习期望从由几个高斯组成的分布中抽取样本。例如astroML

他们使用gmm生成样本:

from matplotlib import pyplot as plt
import numpy as np
from sklearn.mixture import GMM

np.random.seed(1)

gmm = GMM(3, n_iter=1)
gmm.means_ = np.array([[-1], [0], [3]])
gmm.covars_ = np.array([[1.5], [1], [0.5]]) ** 2
gmm.weights_ = np.array([0.3, 0.5, 0.2])

X = gmm.sample(1000)

基本上,它们将高斯混合模型拟合到数据,如(伪代码):

 GMM(number of peaks).fit(X)

X看起来像:

Randomly drawn samples from gmm

X的直方图:

plt.hist(X, 30, normed=True, histtype='stepfilled', alpha=0.4)

Histogram of X

我的数据看起来不一样。我没有从分发中抽取样本。测量设备已经为我提供了离散的概率分布(粒度):

Particle size distribution

如何转换离散pdf?

我试过了:

from scipy import stats
custm = stats.rv_discrete(name='custm', values=(x, y))

custm.pmf(xk)不会重现分发。

我的数据:

x = np.array([  1.00074269e-02,   1.13692409e-02,   1.29163711e-02,
         1.46740353e-02,   1.66708829e-02,   1.89394623e-02,
         2.15167508e-02,   2.44447575e-02,   2.77712084e-02,
         3.15503239e-02,   3.58437026e-02,   4.07213258e-02,
         4.62626976e-02,   5.25581412e-02,   5.97102709e-02,
         6.78356648e-02,   7.70667650e-02,   8.75540365e-02,
         9.94684195e-02,   1.13004116e-01,   1.28381754e-01,
         1.45851987e-01,   1.65699574e-01,   1.88248029e-01,
         2.13864884e-01,   2.42967690e-01,   2.76030816e-01,
         3.13593183e-01,   3.56267049e-01,   4.04747990e-01,
         4.59826233e-01,   5.22399543e-01,   5.93487850e-01,
         6.74249878e-01,   7.66002029e-01,   8.70239845e-01,
         9.88662378e-01,   1.12319989e+00,   1.27604531e+00,
         1.44968999e+00,   1.64696429e+00,   1.87108375e+00,
         2.12570146e+00,   2.41496764e+00,   2.74359726e+00,
         3.11694690e+00,   3.54110209e+00,   4.02297646e+00,
         4.57042446e+00,   5.19236937e+00,   5.89894875e+00,
         6.70167970e+00,   7.61364654e+00,   8.64971415e+00,
         9.82677018e+00,   1.11640004e+01,   1.26832013e+01,
         1.44091357e+01,   1.63699358e+01,   1.85975622e+01,
         2.11283247e+01,   2.40034743e+01,   2.72698752e+01,
         3.09807691e+01,   3.51966426e+01,   3.99862137e+01,
         4.54275512e+01,   5.16093477e+01,   5.86323653e+01,
         6.66110774e+01,   7.56755354e+01,   8.59734879e+01,
         9.76727893e+01,   1.10964136e+02,   1.26064173e+02,
         1.43219028e+02,   1.62708322e+02,   1.84849725e+02,
         2.10004139e+02,   2.38581573e+02,   2.71047834e+02,
         3.07932115e+02,   3.49835622e+02,   3.97441372e+02,
         4.51525329e+02,   5.12969048e+02,   5.82774050e+02,
         6.62078141e+02,   7.52173958e+02,   8.54530045e+02,
         9.70814783e+02,   1.10292359e+03,   1.25300980e+03,
         1.42351980e+03,   1.61723286e+03,   1.83730645e+03,
         2.08732774e+03,   2.37137201e+03,   2.69406911e+03,
         3.06067895e+03,   3.47717718e+03])

y = array([ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
        0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
        0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
        0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.09,  0.18,  0.31,
        0.48,  0.69,  0.94,  1.21,  1.49,  1.76,  2.  ,  2.2 ,  2.36,
        2.47,  2.56,  2.63,  2.69,  2.76,  2.84,  2.91,  2.99,  3.05,
        3.1 ,  3.13,  3.14,  3.11,  3.06,  2.96,  2.81,  2.63,  2.42,
        2.21,  2.03,  1.94,  1.95,  2.09,  2.34,  2.66,  2.97,  3.18,
        3.22,  3.04,  2.64,  2.07,  1.43,  0.83,  0.36,  0.09,  0.  ,
        0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
        0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
        0.  ,  0.  ])

0 个答案:

没有答案