为什么scipy.stats.rv_continuous选择上限的次数太多了?

时间:2016-03-08 10:40:10

标签: python python-2.7 scipy

我包含下面编写的代码。出于某种原因,与初始分布相比,0.804的上限被过度采样。对于我正在使用的两个发行版都会发生这种情况。

这是rv_continuous的常见问题还是我错过了什么?

import matplotlib.pyplot as plt
import scipy.stats as st

class Disk_pdf(st.rv_continuous):
    def _pdf(self,x):
        return (x*(1-np.exp((x-0.804)/0.2539)))/((1+x)*(x**2+0.0256**2)**0.5)

Disk_cv = Disk_pdf(a=0,b=0.804,name='Disk_pdf')
Disk_dist = Disk_cv.rvs(size = 10000)
plt.figure()
plt.hist(Disk_dist,100)




class Bulge_pdf(st.rv_continuous):
    def _pdf(self,x):
        return x*np.exp(-2.368*x-6.691*x**2)
Bulge_cv = Bulge_pdf(a=0,b=0.804,name='Bulge_pdf')

Bulge_dist = Bulge_cv.rvs(size = 10000)
plt.figure()
plt.hist(Bulge_dist,100)

初始分布的图像和使用rv_continuous创建的直方图如下所示。我有两个直方图图像,一个放大以显示除了过采样上限之外的方法捕获分布。另一幅图像显示了y刻度上的直方图,显示了过采样问题的严重程度。

Initial Disk galaxies' distribution and histograms made using rv_continuous which have over sampled upper bound.

Initial Bulge dominated galaxies' distribution and histograms made using rv_continuous which have over sampled upper bound.

1 个答案:

答案 0 :(得分:1)

pdf必须标准化,你的似乎不是:

In [6]: from scipy.integrate import quad

In [7]: quad(Disk_cv.pdf, 0, 0.804)
Out[7]: (0.41121809643549406, 4.005573481922018e-09)